{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Tce3stUlHN0L"
      },
      "source": [
        "##### Copyright 2024 Google LLC."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "cellView": "form",
        "id": "tuOe1ymfHZPu"
      },
      "outputs": [],
      "source": [
        "# @title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dfsDR_omdNea"
      },
      "source": [
        "# Gemma - finetune with Axolotl\n",
        "\n",
        "This notebook demonstrates how to finetune Gemma with Axolotl. [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) is a tool designed to streamline the fine-tuning of various AI models, offering support for multiple configurations and architectures. Axolotl wraps the Hugging Face finetuning functionality and provides a simple interface for finetuning.\n",
        "It's very easy to finetune Gemma with Axolotl. This notebook follows the [official Colab notebook](https://github.com/OpenAccess-AI-Collective/axolotl/blob/main/examples/colab-notebooks/colab-axolotl-example.ipynb) closely.\n",
        "\n",
        "<table align=\"left\">\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/google-gemini/gemma-cookbook/blob/main/Gemma/[Gemma_2]Finetune_with_Axolotl.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
        "  </td>\n",
        "</table>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "MwMiP7jDdAL1"
      },
      "source": [
        "## Setup\n",
        "\n",
        "### Select the Colab runtime\n",
        "To complete this tutorial, you'll need to have a Colab runtime with sufficient resources to run the Gemma model. In this case, you can use a T4 GPU:\n",
        "\n",
        "1. In the upper-right of the Colab window, select **▾ (Additional connection options)**.\n",
        "2. Select **Change runtime type**.\n",
        "3. Under **Hardware accelerator**, select **T4 GPU**."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "8yUF4Hk5dOoz"
      },
      "source": [
        "### Install Axolotl"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "fyBAK3oxH4Z0"
      },
      "source": [
        "### Install PyTorch"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {
        "id": "4pY14h6_bDrr"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Collecting torch==2.1.2\n",
            "  Downloading torch-2.1.2-cp310-cp310-manylinux1_x86_64.whl (670.2 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m670.2/670.2 MB\u001b[0m \u001b[31m2.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch==2.1.2) (3.14.0)\n",
            "Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch==2.1.2) (4.12.0)\n",
            "Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch==2.1.2) (1.12.1)\n",
            "Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch==2.1.2) (3.3)\n",
            "Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch==2.1.2) (3.1.4)\n",
            "Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch==2.1.2) (2023.6.0)\n",
            "Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch==2.1.2)\n",
            "  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)\n",
            "Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch==2.1.2)\n",
            "  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)\n",
            "Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch==2.1.2)\n",
            "  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)\n",
            "Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch==2.1.2)\n",
            "  Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)\n",
            "Collecting nvidia-cublas-cu12==12.1.3.1 (from torch==2.1.2)\n",
            "  Using cached nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)\n",
            "Collecting nvidia-cufft-cu12==11.0.2.54 (from torch==2.1.2)\n",
            "  Using cached nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)\n",
            "Collecting nvidia-curand-cu12==10.3.2.106 (from torch==2.1.2)\n",
            "  Using cached nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)\n",
            "Collecting nvidia-cusolver-cu12==11.4.5.107 (from torch==2.1.2)\n",
            "  Using cached nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)\n",
            "Collecting nvidia-cusparse-cu12==12.1.0.106 (from torch==2.1.2)\n",
            "  Using cached nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)\n",
            "Collecting nvidia-nccl-cu12==2.18.1 (from torch==2.1.2)\n",
            "  Downloading nvidia_nccl_cu12-2.18.1-py3-none-manylinux1_x86_64.whl (209.8 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m209.8/209.8 MB\u001b[0m \u001b[31m3.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting nvidia-nvtx-cu12==12.1.105 (from torch==2.1.2)\n",
            "  Using cached nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)\n",
            "Collecting triton==2.1.0 (from torch==2.1.2)\n",
            "  Downloading triton-2.1.0-0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (89.2 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m89.2/89.2 MB\u001b[0m \u001b[31m18.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting nvidia-nvjitlink-cu12 (from nvidia-cusolver-cu12==11.4.5.107->torch==2.1.2)\n",
            "  Downloading nvidia_nvjitlink_cu12-12.5.40-py3-none-manylinux2014_x86_64.whl (21.3 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m21.3/21.3 MB\u001b[0m \u001b[31m59.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch==2.1.2) (2.1.5)\n",
            "Requirement already satisfied: mpmath<1.4.0,>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from sympy->torch==2.1.2) (1.3.0)\n",
            "Installing collected packages: triton, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, nvidia-cusparse-cu12, nvidia-cudnn-cu12, nvidia-cusolver-cu12, torch\n",
            "  Attempting uninstall: triton\n",
            "    Found existing installation: triton 2.3.0\n",
            "    Uninstalling triton-2.3.0:\n",
            "      Successfully uninstalled triton-2.3.0\n",
            "  Attempting uninstall: torch\n",
            "    Found existing installation: torch 2.3.0+cu121\n",
            "    Uninstalling torch-2.3.0+cu121:\n",
            "      Successfully uninstalled torch-2.3.0+cu121\n",
            "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
            "torchaudio 2.3.0+cu121 requires torch==2.3.0, but you have torch 2.1.2 which is incompatible.\n",
            "torchtext 0.18.0 requires torch>=2.3.0, but you have torch 2.1.2 which is incompatible.\n",
            "torchvision 0.18.0+cu121 requires torch==2.3.0, but you have torch 2.1.2 which is incompatible.\u001b[0m\u001b[31m\n",
            "\u001b[0mSuccessfully installed nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.18.1 nvidia-nvjitlink-cu12-12.5.40 nvidia-nvtx-cu12-12.1.105 torch-2.1.2 triton-2.1.0\n"
          ]
        }
      ],
      "source": [
        "!pip install torch==\"2.1.2\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "8hrOzmO0HOo8"
      },
      "source": [
        "### Colab runtime\n",
        "\n",
        "At this point, restart your Colab runtime for the newly installed PyTorch version to take effect."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Mg2cHBtNH6_1"
      },
      "source": [
        "### Install Axolotl\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {
        "id": "h7IcTglsIABR"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Obtaining axolotl from git+https://github.com/OpenAccess-AI-Collective/axolotl#egg=axolotl\n",
            "  Cloning https://github.com/OpenAccess-AI-Collective/axolotl to ./src/axolotl\n",
            "  Running command git clone --filter=blob:none --quiet https://github.com/OpenAccess-AI-Collective/axolotl /content/src/axolotl\n",
            "  Resolved https://github.com/OpenAccess-AI-Collective/axolotl to commit 1f151c0d52d2d4c78c5e1b1a4ff4fb64cba1f45d\n",
            "  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "Collecting fschat@ git+https://github.com/lm-sys/FastChat.git@27a05b04a35510afb1d767ae7e5990cbd278f8fe (from axolotl)\n",
            "  Cloning https://github.com/lm-sys/FastChat.git (to revision 27a05b04a35510afb1d767ae7e5990cbd278f8fe) to /tmp/pip-install-j71xwbh1/fschat_d8a49c65bcf3440d862b4934c0307124\n",
            "  Running command git clone --filter=blob:none --quiet https://github.com/lm-sys/FastChat.git /tmp/pip-install-j71xwbh1/fschat_d8a49c65bcf3440d862b4934c0307124\n",
            "  Running command git rev-parse -q --verify 'sha^27a05b04a35510afb1d767ae7e5990cbd278f8fe'\n",
            "  Running command git fetch -q https://github.com/lm-sys/FastChat.git 27a05b04a35510afb1d767ae7e5990cbd278f8fe\n",
            "  Running command git checkout -q 27a05b04a35510afb1d767ae7e5990cbd278f8fe\n",
            "  Resolved https://github.com/lm-sys/FastChat.git to commit 27a05b04a35510afb1d767ae7e5990cbd278f8fe\n",
            "  Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n",
            "  Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n",
            "  Installing backend dependencies ... \u001b[?25l\u001b[?25hdone\n",
            "  Preparing metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
            "Collecting packaging==23.2 (from axolotl)\n",
            "  Downloading packaging-23.2-py3-none-any.whl (53 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m53.0/53.0 kB\u001b[0m \u001b[31m2.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting peft==0.11.1 (from axolotl)\n",
            "  Downloading peft-0.11.1-py3-none-any.whl (251 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m251.6/251.6 kB\u001b[0m \u001b[31m6.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: transformers==4.41.1 in /usr/local/lib/python3.10/dist-packages (from axolotl) (4.41.1)\n",
            "Requirement already satisfied: tokenizers==0.19.1 in /usr/local/lib/python3.10/dist-packages (from axolotl) (0.19.1)\n",
            "Collecting bitsandbytes==0.43.1 (from axolotl)\n",
            "  Downloading bitsandbytes-0.43.1-py3-none-manylinux_2_24_x86_64.whl (119.8 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m119.8/119.8 MB\u001b[0m \u001b[31m14.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting accelerate==0.30.1 (from axolotl)\n",
            "  Downloading accelerate-0.30.1-py3-none-any.whl (302 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m302.6/302.6 kB\u001b[0m \u001b[31m44.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting pydantic==2.6.3 (from axolotl)\n",
            "  Downloading pydantic-2.6.3-py3-none-any.whl (395 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m395.2/395.2 kB\u001b[0m \u001b[31m47.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting addict (from axolotl)\n",
            "  Downloading addict-2.4.0-py3-none-any.whl (3.8 kB)\n",
            "Collecting fire (from axolotl)\n",
            "  Downloading fire-0.6.0.tar.gz (88 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m88.4/88.4 kB\u001b[0m \u001b[31m17.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "Requirement already satisfied: PyYAML>=6.0 in /usr/local/lib/python3.10/dist-packages (from axolotl) (6.0.1)\n",
            "Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from axolotl) (2.31.0)\n",
            "Collecting datasets==2.19.1 (from axolotl)\n",
            "  Downloading datasets-2.19.1-py3-none-any.whl (542 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m542.0/542.0 kB\u001b[0m \u001b[31m54.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: sentencepiece in /usr/local/lib/python3.10/dist-packages (from axolotl) (0.1.99)\n",
            "Collecting wandb (from axolotl)\n",
            "  Downloading wandb-0.17.0-py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.7 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m6.7/6.7 MB\u001b[0m \u001b[31m70.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting einops (from axolotl)\n",
            "  Downloading einops-0.8.0-py3-none-any.whl (43 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m43.2/43.2 kB\u001b[0m \u001b[31m7.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting optimum==1.16.2 (from axolotl)\n",
            "  Downloading optimum-1.16.2-py3-none-any.whl (402 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m402.5/402.5 kB\u001b[0m \u001b[31m49.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting hf_transfer (from axolotl)\n",
            "  Downloading hf_transfer-0.1.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.4 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m4.4/4.4 MB\u001b[0m \u001b[31m85.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting colorama (from axolotl)\n",
            "  Downloading colorama-0.4.6-py2.py3-none-any.whl (25 kB)\n",
            "Requirement already satisfied: numba in /usr/local/lib/python3.10/dist-packages (from axolotl) (0.58.1)\n",
            "Requirement already satisfied: numpy>=1.24.4 in /usr/local/lib/python3.10/dist-packages (from axolotl) (1.25.2)\n",
            "Collecting evaluate==0.4.1 (from axolotl)\n",
            "  Downloading evaluate-0.4.1-py3-none-any.whl (84 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m84.1/84.1 kB\u001b[0m \u001b[31m16.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: scipy in /usr/local/lib/python3.10/dist-packages (from axolotl) (1.11.4)\n",
            "Requirement already satisfied: scikit-learn==1.2.2 in /usr/local/lib/python3.10/dist-packages (from axolotl) (1.2.2)\n",
            "Collecting pynvml (from axolotl)\n",
            "  Downloading pynvml-11.5.0-py3-none-any.whl (53 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m53.1/53.1 kB\u001b[0m \u001b[31m10.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting art (from axolotl)\n",
            "  Downloading art-6.2-py3-none-any.whl (601 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m601.8/601.8 kB\u001b[0m \u001b[31m66.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting gradio==3.50.2 (from axolotl)\n",
            "  Downloading gradio-3.50.2-py3-none-any.whl (20.3 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m20.3/20.3 MB\u001b[0m \u001b[31m71.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: tensorboard in /usr/local/lib/python3.10/dist-packages (from axolotl) (2.15.2)\n",
            "Collecting s3fs (from axolotl)\n",
            "  Downloading s3fs-2024.6.0-py3-none-any.whl (29 kB)\n",
            "Requirement already satisfied: gcsfs in /usr/local/lib/python3.10/dist-packages (from axolotl) (2023.6.0)\n",
            "Collecting trl==0.8.6 (from axolotl)\n",
            "  Downloading trl-0.8.6-py3-none-any.whl (245 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m245.2/245.2 kB\u001b[0m \u001b[31m36.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting zstandard==0.22.0 (from axolotl)\n",
            "  Downloading zstandard-0.22.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.4 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m5.4/5.4 MB\u001b[0m \u001b[31m103.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: fastcore in /usr/local/lib/python3.10/dist-packages (from axolotl) (1.5.41)\n",
            "Requirement already satisfied: torch==2.1.2 in /usr/local/lib/python3.10/dist-packages (from axolotl) (2.1.2)\n",
            "Collecting xformers>=0.0.23.post1 (from axolotl)\n",
            "  Downloading xformers-0.0.26.post1-cp310-cp310-manylinux2014_x86_64.whl (222.7 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m222.7/222.7 MB\u001b[0m \u001b[31m3.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from accelerate==0.30.1->axolotl) (5.9.5)\n",
            "Requirement already satisfied: huggingface-hub in /usr/local/lib/python3.10/dist-packages (from accelerate==0.30.1->axolotl) (0.23.2)\n",
            "Requirement already satisfied: safetensors>=0.3.1 in /usr/local/lib/python3.10/dist-packages (from accelerate==0.30.1->axolotl) (0.4.3)\n",
            "Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from datasets==2.19.1->axolotl) (3.14.0)\n",
            "Requirement already satisfied: pyarrow>=12.0.0 in /usr/local/lib/python3.10/dist-packages (from datasets==2.19.1->axolotl) (14.0.2)\n",
            "Requirement already satisfied: pyarrow-hotfix in /usr/local/lib/python3.10/dist-packages (from datasets==2.19.1->axolotl) (0.6)\n",
            "Collecting dill<0.3.9,>=0.3.0 (from datasets==2.19.1->axolotl)\n",
            "  Downloading dill-0.3.8-py3-none-any.whl (116 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m116.3/116.3 kB\u001b[0m \u001b[31m20.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from datasets==2.19.1->axolotl) (2.0.3)\n",
            "Requirement already satisfied: tqdm>=4.62.1 in /usr/local/lib/python3.10/dist-packages (from datasets==2.19.1->axolotl) (4.66.4)\n",
            "Collecting xxhash (from datasets==2.19.1->axolotl)\n",
            "  Downloading xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m194.1/194.1 kB\u001b[0m \u001b[31m30.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting multiprocess (from datasets==2.19.1->axolotl)\n",
            "  Downloading multiprocess-0.70.16-py310-none-any.whl (134 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m134.8/134.8 kB\u001b[0m \u001b[31m23.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: fsspec[http]<=2024.3.1,>=2023.1.0 in /usr/local/lib/python3.10/dist-packages (from datasets==2.19.1->axolotl) (2023.6.0)\n",
            "Requirement already satisfied: aiohttp in /usr/local/lib/python3.10/dist-packages (from datasets==2.19.1->axolotl) (3.9.5)\n",
            "Collecting responses<0.19 (from evaluate==0.4.1->axolotl)\n",
            "  Downloading responses-0.18.0-py3-none-any.whl (38 kB)\n",
            "Collecting aiofiles<24.0,>=22.0 (from gradio==3.50.2->axolotl)\n",
            "  Downloading aiofiles-23.2.1-py3-none-any.whl (15 kB)\n",
            "Requirement already satisfied: altair<6.0,>=4.2.0 in /usr/local/lib/python3.10/dist-packages (from gradio==3.50.2->axolotl) (4.2.2)\n",
            "Collecting fastapi (from gradio==3.50.2->axolotl)\n",
            "  Downloading fastapi-0.111.0-py3-none-any.whl (91 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m92.0/92.0 kB\u001b[0m \u001b[31m17.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting ffmpy (from gradio==3.50.2->axolotl)\n",
            "  Downloading ffmpy-0.3.2.tar.gz (5.5 kB)\n",
            "  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "Collecting gradio-client==0.6.1 (from gradio==3.50.2->axolotl)\n",
            "  Downloading gradio_client-0.6.1-py3-none-any.whl (299 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m299.2/299.2 kB\u001b[0m \u001b[31m43.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting httpx (from gradio==3.50.2->axolotl)\n",
            "  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m75.6/75.6 kB\u001b[0m \u001b[31m13.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: importlib-resources<7.0,>=1.3 in /usr/local/lib/python3.10/dist-packages (from gradio==3.50.2->axolotl) (6.4.0)\n",
            "Requirement already satisfied: jinja2<4.0 in /usr/local/lib/python3.10/dist-packages (from gradio==3.50.2->axolotl) (3.1.4)\n",
            "Requirement already satisfied: markupsafe~=2.0 in /usr/local/lib/python3.10/dist-packages (from gradio==3.50.2->axolotl) (2.1.5)\n",
            "Requirement already satisfied: matplotlib~=3.0 in /usr/local/lib/python3.10/dist-packages (from gradio==3.50.2->axolotl) (3.7.1)\n",
            "Collecting orjson~=3.0 (from gradio==3.50.2->axolotl)\n",
            "  Downloading orjson-3.10.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (142 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m142.5/142.5 kB\u001b[0m \u001b[31m23.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: pillow<11.0,>=8.0 in /usr/local/lib/python3.10/dist-packages (from gradio==3.50.2->axolotl) (9.4.0)\n",
            "Collecting pydub (from gradio==3.50.2->axolotl)\n",
            "  Downloading pydub-0.25.1-py2.py3-none-any.whl (32 kB)\n",
            "Collecting python-multipart (from gradio==3.50.2->axolotl)\n",
            "  Downloading python_multipart-0.0.9-py3-none-any.whl (22 kB)\n",
            "Collecting semantic-version~=2.0 (from gradio==3.50.2->axolotl)\n",
            "  Downloading semantic_version-2.10.0-py2.py3-none-any.whl (15 kB)\n",
            "Requirement already satisfied: typing-extensions~=4.0 in /usr/local/lib/python3.10/dist-packages (from gradio==3.50.2->axolotl) (4.12.0)\n",
            "Collecting uvicorn>=0.14.0 (from gradio==3.50.2->axolotl)\n",
            "  Downloading uvicorn-0.30.1-py3-none-any.whl (62 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m62.4/62.4 kB\u001b[0m \u001b[31m11.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting websockets<12.0,>=10.0 (from gradio==3.50.2->axolotl)\n",
            "  Downloading websockets-11.0.3-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (129 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m129.9/129.9 kB\u001b[0m \u001b[31m24.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting coloredlogs (from optimum==1.16.2->axolotl)\n",
            "  Downloading coloredlogs-15.0.1-py2.py3-none-any.whl (46 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m46.0/46.0 kB\u001b[0m \u001b[31m8.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from optimum==1.16.2->axolotl) (1.12.1)\n",
            "Requirement already satisfied: annotated-types>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from pydantic==2.6.3->axolotl) (0.7.0)\n",
            "Collecting pydantic-core==2.16.3 (from pydantic==2.6.3->axolotl)\n",
            "  Downloading pydantic_core-2.16.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.2 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.2/2.2 MB\u001b[0m \u001b[31m61.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: joblib>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from scikit-learn==1.2.2->axolotl) (1.4.2)\n",
            "Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn==1.2.2->axolotl) (3.5.0)\n",
            "Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch==2.1.2->axolotl) (3.3)\n",
            "Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch==2.1.2->axolotl) (12.1.105)\n",
            "Requirement already satisfied: nvidia-cuda-runtime-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch==2.1.2->axolotl) (12.1.105)\n",
            "Requirement already satisfied: nvidia-cuda-cupti-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch==2.1.2->axolotl) (12.1.105)\n",
            "Requirement already satisfied: nvidia-cudnn-cu12==8.9.2.26 in /usr/local/lib/python3.10/dist-packages (from torch==2.1.2->axolotl) (8.9.2.26)\n",
            "Requirement already satisfied: nvidia-cublas-cu12==12.1.3.1 in /usr/local/lib/python3.10/dist-packages (from torch==2.1.2->axolotl) (12.1.3.1)\n",
            "Requirement already satisfied: nvidia-cufft-cu12==11.0.2.54 in /usr/local/lib/python3.10/dist-packages (from torch==2.1.2->axolotl) (11.0.2.54)\n",
            "Requirement already satisfied: nvidia-curand-cu12==10.3.2.106 in /usr/local/lib/python3.10/dist-packages (from torch==2.1.2->axolotl) (10.3.2.106)\n",
            "Requirement already satisfied: nvidia-cusolver-cu12==11.4.5.107 in /usr/local/lib/python3.10/dist-packages (from torch==2.1.2->axolotl) (11.4.5.107)\n",
            "Requirement already satisfied: nvidia-cusparse-cu12==12.1.0.106 in /usr/local/lib/python3.10/dist-packages (from torch==2.1.2->axolotl) (12.1.0.106)\n",
            "Requirement already satisfied: nvidia-nccl-cu12==2.18.1 in /usr/local/lib/python3.10/dist-packages (from torch==2.1.2->axolotl) (2.18.1)\n",
            "Requirement already satisfied: nvidia-nvtx-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch==2.1.2->axolotl) (12.1.105)\n",
            "Requirement already satisfied: triton==2.1.0 in /usr/local/lib/python3.10/dist-packages (from torch==2.1.2->axolotl) (2.1.0)\n",
            "Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers==4.41.1->axolotl) (2024.5.15)\n",
            "Collecting tyro>=0.5.11 (from trl==0.8.6->axolotl)\n",
            "  Downloading tyro-0.8.4-py3-none-any.whl (102 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m102.4/102.4 kB\u001b[0m \u001b[31m18.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: nvidia-nvjitlink-cu12 in /usr/local/lib/python3.10/dist-packages (from nvidia-cusolver-cu12==11.4.5.107->torch==2.1.2->axolotl) (12.5.40)\n",
            "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->axolotl) (3.3.2)\n",
            "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->axolotl) (3.7)\n",
            "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->axolotl) (2.0.7)\n",
            "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->axolotl) (2024.2.2)\n",
            "INFO: pip is looking at multiple versions of xformers to determine which version is compatible with other requirements. This could take a while.\n",
            "Collecting xformers>=0.0.23.post1 (from axolotl)\n",
            "  Downloading xformers-0.0.25.post1-cp310-cp310-manylinux2014_x86_64.whl (222.5 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m222.5/222.5 MB\u001b[0m \u001b[31m4.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Downloading xformers-0.0.25-cp310-cp310-manylinux2014_x86_64.whl (222.5 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m222.5/222.5 MB\u001b[0m \u001b[31m4.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Downloading xformers-0.0.24-cp310-cp310-manylinux2014_x86_64.whl (218.2 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m218.2/218.2 MB\u001b[0m \u001b[31m5.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Downloading xformers-0.0.23.post1-cp310-cp310-manylinux2014_x86_64.whl (213.0 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m213.0/213.0 MB\u001b[0m \u001b[31m4.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: six in /usr/local/lib/python3.10/dist-packages (from fire->axolotl) (1.16.0)\n",
            "Requirement already satisfied: termcolor in /usr/local/lib/python3.10/dist-packages (from fire->axolotl) (2.4.0)\n",
            "Collecting markdown2[all] (from fschat@ git+https://github.com/lm-sys/FastChat.git@27a05b04a35510afb1d767ae7e5990cbd278f8fe->axolotl)\n",
            "  Downloading markdown2-2.4.13-py2.py3-none-any.whl (41 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m41.3/41.3 kB\u001b[0m \u001b[31m7.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting nh3 (from fschat@ git+https://github.com/lm-sys/FastChat.git@27a05b04a35510afb1d767ae7e5990cbd278f8fe->axolotl)\n",
            "  Downloading nh3-0.2.17-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (777 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m777.1/777.1 kB\u001b[0m \u001b[31m76.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: prompt-toolkit>=3.0.0 in /usr/local/lib/python3.10/dist-packages (from fschat@ git+https://github.com/lm-sys/FastChat.git@27a05b04a35510afb1d767ae7e5990cbd278f8fe->axolotl) (3.0.45)\n",
            "Requirement already satisfied: rich>=10.0.0 in /usr/local/lib/python3.10/dist-packages (from fschat@ git+https://github.com/lm-sys/FastChat.git@27a05b04a35510afb1d767ae7e5990cbd278f8fe->axolotl) (13.7.1)\n",
            "Collecting shortuuid (from fschat@ git+https://github.com/lm-sys/FastChat.git@27a05b04a35510afb1d767ae7e5990cbd278f8fe->axolotl)\n",
            "  Downloading shortuuid-1.0.13-py3-none-any.whl (10 kB)\n",
            "Collecting tiktoken (from fschat@ git+https://github.com/lm-sys/FastChat.git@27a05b04a35510afb1d767ae7e5990cbd278f8fe->axolotl)\n",
            "  Downloading tiktoken-0.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.1/1.1 MB\u001b[0m \u001b[31m87.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: decorator>4.1.2 in /usr/local/lib/python3.10/dist-packages (from gcsfs->axolotl) (4.4.2)\n",
            "Requirement already satisfied: google-auth>=1.2 in /usr/local/lib/python3.10/dist-packages (from gcsfs->axolotl) (2.27.0)\n",
            "Requirement already satisfied: google-auth-oauthlib in /usr/local/lib/python3.10/dist-packages (from gcsfs->axolotl) (1.2.0)\n",
            "Requirement already satisfied: google-cloud-storage in /usr/local/lib/python3.10/dist-packages (from gcsfs->axolotl) (2.8.0)\n",
            "Requirement already satisfied: llvmlite<0.42,>=0.41.0dev0 in /usr/local/lib/python3.10/dist-packages (from numba->axolotl) (0.41.1)\n",
            "Collecting aiobotocore<3.0.0,>=2.5.4 (from s3fs->axolotl)\n",
            "  Downloading aiobotocore-2.13.0-py3-none-any.whl (76 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m76.6/76.6 kB\u001b[0m \u001b[31m14.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hINFO: pip is looking at multiple versions of s3fs to determine which version is compatible with other requirements. This could take a while.\n",
            "Collecting s3fs (from axolotl)\n",
            "  Downloading s3fs-2024.5.0-py3-none-any.whl (29 kB)\n",
            "  Downloading s3fs-2024.3.1-py3-none-any.whl (29 kB)\n",
            "  Downloading s3fs-2024.3.0-py3-none-any.whl (29 kB)\n",
            "  Downloading s3fs-2024.2.0-py3-none-any.whl (28 kB)\n",
            "  Downloading s3fs-2023.12.2-py3-none-any.whl (28 kB)\n",
            "  Downloading s3fs-2023.12.1-py3-none-any.whl (28 kB)\n",
            "  Downloading s3fs-2023.10.0-py3-none-any.whl (28 kB)\n",
            "Collecting aiobotocore~=2.7.0 (from s3fs->axolotl)\n",
            "  Downloading aiobotocore-2.7.0-py3-none-any.whl (73 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m73.5/73.5 kB\u001b[0m \u001b[31m13.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hINFO: pip is looking at multiple versions of s3fs to determine which version is compatible with other requirements. This could take a while.\n",
            "Collecting s3fs (from axolotl)\n",
            "  Downloading s3fs-2023.9.2-py3-none-any.whl (28 kB)\n",
            "Collecting aiobotocore~=2.5.4 (from s3fs->axolotl)\n",
            "  Downloading aiobotocore-2.5.4-py3-none-any.whl (73 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m73.4/73.4 kB\u001b[0m \u001b[31m14.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting s3fs (from axolotl)\n",
            "  Downloading s3fs-2023.9.1-py3-none-any.whl (28 kB)\n",
            "  Downloading s3fs-2023.9.0-py3-none-any.whl (28 kB)\n",
            "  Downloading s3fs-2023.6.0-py3-none-any.whl (28 kB)\n",
            "Requirement already satisfied: absl-py>=0.4 in /usr/local/lib/python3.10/dist-packages (from tensorboard->axolotl) (1.4.0)\n",
            "Requirement already satisfied: grpcio>=1.48.2 in /usr/local/lib/python3.10/dist-packages (from tensorboard->axolotl) (1.64.0)\n",
            "Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.10/dist-packages (from tensorboard->axolotl) (3.6)\n",
            "Requirement already satisfied: protobuf!=4.24.0,>=3.19.6 in /usr/local/lib/python3.10/dist-packages (from tensorboard->axolotl) (3.20.3)\n",
            "Requirement already satisfied: setuptools>=41.0.0 in /usr/local/lib/python3.10/dist-packages (from tensorboard->axolotl) (67.7.2)\n",
            "Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /usr/local/lib/python3.10/dist-packages (from tensorboard->axolotl) (0.7.2)\n",
            "Requirement already satisfied: werkzeug>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from tensorboard->axolotl) (3.0.3)\n",
            "Requirement already satisfied: click!=8.0.0,>=7.1 in /usr/local/lib/python3.10/dist-packages (from wandb->axolotl) (8.1.7)\n",
            "Collecting docker-pycreds>=0.4.0 (from wandb->axolotl)\n",
            "  Downloading docker_pycreds-0.4.0-py2.py3-none-any.whl (9.0 kB)\n",
            "Collecting gitpython!=3.1.29,>=1.0.0 (from wandb->axolotl)\n",
            "  Downloading GitPython-3.1.43-py3-none-any.whl (207 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m207.3/207.3 kB\u001b[0m \u001b[31m34.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: platformdirs in /usr/local/lib/python3.10/dist-packages (from wandb->axolotl) (4.2.2)\n",
            "Collecting sentry-sdk>=1.0.0 (from wandb->axolotl)\n",
            "  Downloading sentry_sdk-2.3.1-py2.py3-none-any.whl (289 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m289.0/289.0 kB\u001b[0m \u001b[31m42.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting setproctitle (from wandb->axolotl)\n",
            "  Downloading setproctitle-1.3.3-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (30 kB)\n",
            "Collecting botocore<1.31.18,>=1.31.17 (from aiobotocore~=2.5.4->s3fs->axolotl)\n",
            "  Downloading botocore-1.31.17-py3-none-any.whl (11.1 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m11.1/11.1 MB\u001b[0m \u001b[31m108.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: wrapt<2.0.0,>=1.10.10 in /usr/local/lib/python3.10/dist-packages (from aiobotocore~=2.5.4->s3fs->axolotl) (1.14.1)\n",
            "Collecting aioitertools<1.0.0,>=0.5.1 (from aiobotocore~=2.5.4->s3fs->axolotl)\n",
            "  Downloading aioitertools-0.11.0-py3-none-any.whl (23 kB)\n",
            "Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets==2.19.1->axolotl) (1.3.1)\n",
            "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets==2.19.1->axolotl) (23.2.0)\n",
            "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets==2.19.1->axolotl) (1.4.1)\n",
            "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets==2.19.1->axolotl) (6.0.5)\n",
            "Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets==2.19.1->axolotl) (1.9.4)\n",
            "Requirement already satisfied: async-timeout<5.0,>=4.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets==2.19.1->axolotl) (4.0.3)\n",
            "Requirement already satisfied: entrypoints in /usr/local/lib/python3.10/dist-packages (from altair<6.0,>=4.2.0->gradio==3.50.2->axolotl) (0.4)\n",
            "Requirement already satisfied: jsonschema>=3.0 in /usr/local/lib/python3.10/dist-packages (from altair<6.0,>=4.2.0->gradio==3.50.2->axolotl) (4.19.2)\n",
            "Requirement already satisfied: toolz in /usr/local/lib/python3.10/dist-packages (from altair<6.0,>=4.2.0->gradio==3.50.2->axolotl) (0.12.1)\n",
            "Collecting gitdb<5,>=4.0.1 (from gitpython!=3.1.29,>=1.0.0->wandb->axolotl)\n",
            "  Downloading gitdb-4.0.11-py3-none-any.whl (62 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m62.7/62.7 kB\u001b[0m \u001b[31m12.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: cachetools<6.0,>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from google-auth>=1.2->gcsfs->axolotl) (5.3.3)\n",
            "Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.10/dist-packages (from google-auth>=1.2->gcsfs->axolotl) (0.4.0)\n",
            "Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.10/dist-packages (from google-auth>=1.2->gcsfs->axolotl) (4.9)\n",
            "Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.10/dist-packages (from google-auth-oauthlib->gcsfs->axolotl) (1.3.1)\n",
            "Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib~=3.0->gradio==3.50.2->axolotl) (1.2.1)\n",
            "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib~=3.0->gradio==3.50.2->axolotl) (0.12.1)\n",
            "Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib~=3.0->gradio==3.50.2->axolotl) (4.52.4)\n",
            "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib~=3.0->gradio==3.50.2->axolotl) (1.4.5)\n",
            "Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib~=3.0->gradio==3.50.2->axolotl) (3.1.2)\n",
            "Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib~=3.0->gradio==3.50.2->axolotl) (2.8.2)\n",
            "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->datasets==2.19.1->axolotl) (2023.4)\n",
            "Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas->datasets==2.19.1->axolotl) (2024.1)\n",
            "Requirement already satisfied: wcwidth in /usr/local/lib/python3.10/dist-packages (from prompt-toolkit>=3.0.0->fschat@ git+https://github.com/lm-sys/FastChat.git@27a05b04a35510afb1d767ae7e5990cbd278f8fe->axolotl) (0.2.13)\n",
            "Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.10/dist-packages (from rich>=10.0.0->fschat@ git+https://github.com/lm-sys/FastChat.git@27a05b04a35510afb1d767ae7e5990cbd278f8fe->axolotl) (3.0.0)\n",
            "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.10/dist-packages (from rich>=10.0.0->fschat@ git+https://github.com/lm-sys/FastChat.git@27a05b04a35510afb1d767ae7e5990cbd278f8fe->axolotl) (2.16.1)\n",
            "Requirement already satisfied: docstring-parser>=0.14.1 in /usr/local/lib/python3.10/dist-packages (from tyro>=0.5.11->trl==0.8.6->axolotl) (0.16)\n",
            "Collecting shtab>=1.5.6 (from tyro>=0.5.11->trl==0.8.6->axolotl)\n",
            "  Downloading shtab-1.7.1-py3-none-any.whl (14 kB)\n",
            "Collecting h11>=0.8 (from uvicorn>=0.14.0->gradio==3.50.2->axolotl)\n",
            "  Downloading h11-0.14.0-py3-none-any.whl (58 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m58.3/58.3 kB\u001b[0m \u001b[31m9.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting humanfriendly>=9.1 (from coloredlogs->optimum==1.16.2->axolotl)\n",
            "  Downloading humanfriendly-10.0-py2.py3-none-any.whl (86 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m86.8/86.8 kB\u001b[0m \u001b[31m17.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting starlette<0.38.0,>=0.37.2 (from fastapi->gradio==3.50.2->axolotl)\n",
            "  Downloading starlette-0.37.2-py3-none-any.whl (71 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m71.9/71.9 kB\u001b[0m \u001b[31m13.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting fastapi-cli>=0.0.2 (from fastapi->gradio==3.50.2->axolotl)\n",
            "  Downloading fastapi_cli-0.0.4-py3-none-any.whl (9.5 kB)\n",
            "Collecting ujson!=4.0.2,!=4.1.0,!=4.2.0,!=4.3.0,!=5.0.0,!=5.1.0,>=4.0.1 (from fastapi->gradio==3.50.2->axolotl)\n",
            "  Downloading ujson-5.10.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (53 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m53.6/53.6 kB\u001b[0m \u001b[31m8.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting email_validator>=2.0.0 (from fastapi->gradio==3.50.2->axolotl)\n",
            "  Downloading email_validator-2.1.1-py3-none-any.whl (30 kB)\n",
            "Requirement already satisfied: anyio in /usr/local/lib/python3.10/dist-packages (from httpx->gradio==3.50.2->axolotl) (3.7.1)\n",
            "Collecting httpcore==1.* (from httpx->gradio==3.50.2->axolotl)\n",
            "  Downloading httpcore-1.0.5-py3-none-any.whl (77 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m77.9/77.9 kB\u001b[0m \u001b[31m13.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: sniffio in /usr/local/lib/python3.10/dist-packages (from httpx->gradio==3.50.2->axolotl) (1.3.1)\n",
            "Requirement already satisfied: google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5 in /usr/local/lib/python3.10/dist-packages (from google-cloud-storage->gcsfs->axolotl) (2.11.1)\n",
            "Requirement already satisfied: google-cloud-core<3.0dev,>=2.3.0 in /usr/local/lib/python3.10/dist-packages (from google-cloud-storage->gcsfs->axolotl) (2.3.3)\n",
            "Requirement already satisfied: google-resumable-media>=2.3.2 in /usr/local/lib/python3.10/dist-packages (from google-cloud-storage->gcsfs->axolotl) (2.7.0)\n",
            "Collecting wavedrom (from markdown2[all]->fschat@ git+https://github.com/lm-sys/FastChat.git@27a05b04a35510afb1d767ae7e5990cbd278f8fe->axolotl)\n",
            "  Downloading wavedrom-2.0.3.post3.tar.gz (137 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m137.7/137.7 kB\u001b[0m \u001b[31m25.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "Requirement already satisfied: mpmath<1.4.0,>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from sympy->optimum==1.16.2->axolotl) (1.3.0)\n",
            "Collecting jmespath<2.0.0,>=0.7.1 (from botocore<1.31.18,>=1.31.17->aiobotocore~=2.5.4->s3fs->axolotl)\n",
            "  Downloading jmespath-1.0.1-py3-none-any.whl (20 kB)\n",
            "Collecting urllib3<3,>=1.21.1 (from requests->axolotl)\n",
            "  Downloading urllib3-1.26.18-py2.py3-none-any.whl (143 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m143.8/143.8 kB\u001b[0m \u001b[31m26.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting dnspython>=2.0.0 (from email_validator>=2.0.0->fastapi->gradio==3.50.2->axolotl)\n",
            "  Downloading dnspython-2.6.1-py3-none-any.whl (307 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m307.7/307.7 kB\u001b[0m \u001b[31m44.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting typer>=0.12.3 (from fastapi-cli>=0.0.2->fastapi->gradio==3.50.2->axolotl)\n",
            "  Downloading typer-0.12.3-py3-none-any.whl (47 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m47.2/47.2 kB\u001b[0m \u001b[31m8.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting smmap<6,>=3.0.1 (from gitdb<5,>=4.0.1->gitpython!=3.1.29,>=1.0.0->wandb->axolotl)\n",
            "  Downloading smmap-5.0.1-py3-none-any.whl (24 kB)\n",
            "Requirement already satisfied: googleapis-common-protos<2.0.dev0,>=1.56.2 in /usr/local/lib/python3.10/dist-packages (from google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0dev,>=1.31.5->google-cloud-storage->gcsfs->axolotl) (1.63.0)\n",
            "Requirement already satisfied: google-crc32c<2.0dev,>=1.0 in /usr/local/lib/python3.10/dist-packages (from google-resumable-media>=2.3.2->google-cloud-storage->gcsfs->axolotl) (1.5.0)\n",
            "Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0->altair<6.0,>=4.2.0->gradio==3.50.2->axolotl) (2023.12.1)\n",
            "Requirement already satisfied: referencing>=0.28.4 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0->altair<6.0,>=4.2.0->gradio==3.50.2->axolotl) (0.35.1)\n",
            "Requirement already satisfied: rpds-py>=0.7.1 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0->altair<6.0,>=4.2.0->gradio==3.50.2->axolotl) (0.18.1)\n",
            "Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.10/dist-packages (from markdown-it-py>=2.2.0->rich>=10.0.0->fschat@ git+https://github.com/lm-sys/FastChat.git@27a05b04a35510afb1d767ae7e5990cbd278f8fe->axolotl) (0.1.2)\n",
            "Requirement already satisfied: pyasn1<0.7.0,>=0.4.6 in /usr/local/lib/python3.10/dist-packages (from pyasn1-modules>=0.2.1->google-auth>=1.2->gcsfs->axolotl) (0.6.0)\n",
            "Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.10/dist-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib->gcsfs->axolotl) (3.2.2)\n",
            "Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio->httpx->gradio==3.50.2->axolotl) (1.2.1)\n",
            "Collecting httptools>=0.5.0 (from uvicorn>=0.14.0->gradio==3.50.2->axolotl)\n",
            "  Downloading httptools-0.6.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (341 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m341.4/341.4 kB\u001b[0m \u001b[31m46.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting python-dotenv>=0.13 (from uvicorn>=0.14.0->gradio==3.50.2->axolotl)\n",
            "  Downloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)\n",
            "Collecting uvloop!=0.15.0,!=0.15.1,>=0.14.0 (from uvicorn>=0.14.0->gradio==3.50.2->axolotl)\n",
            "  Downloading uvloop-0.19.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.4 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.4/3.4 MB\u001b[0m \u001b[31m90.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting watchfiles>=0.13 (from uvicorn>=0.14.0->gradio==3.50.2->axolotl)\n",
            "  Downloading watchfiles-0.22.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.2/1.2 MB\u001b[0m \u001b[31m87.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting svgwrite (from wavedrom->markdown2[all]->fschat@ git+https://github.com/lm-sys/FastChat.git@27a05b04a35510afb1d767ae7e5990cbd278f8fe->axolotl)\n",
            "  Downloading svgwrite-1.4.3-py3-none-any.whl (67 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m67.1/67.1 kB\u001b[0m \u001b[31m12.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting shellingham>=1.3.0 (from typer>=0.12.3->fastapi-cli>=0.0.2->fastapi->gradio==3.50.2->axolotl)\n",
            "  Downloading shellingham-1.5.4-py2.py3-none-any.whl (9.8 kB)\n",
            "Building wheels for collected packages: fire, fschat, ffmpy, wavedrom\n",
            "  Building wheel for fire (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "  Created wheel for fire: filename=fire-0.6.0-py2.py3-none-any.whl size=117029 sha256=6703418fcd5f5f6a20d984ede34416837fa2c2164b6d43b02a532446d51c797c\n",
            "  Stored in directory: /root/.cache/pip/wheels/d6/6d/5d/5b73fa0f46d01a793713f8859201361e9e581ced8c75e5c6a3\n",
            "  Building wheel for fschat (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
            "  Created wheel for fschat: filename=fschat-0.2.36-py3-none-any.whl size=272079 sha256=37addba8c8ae058ced3354107feac7faba23b9f8180d6a8a69729c2aac3d67df\n",
            "  Stored in directory: /root/.cache/pip/wheels/21/dc/55/8647f928ab3e6390d35d3bb898acca851918560726ecdfc42a\n",
            "  Building wheel for ffmpy (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "  Created wheel for ffmpy: filename=ffmpy-0.3.2-py3-none-any.whl size=5584 sha256=c16626195f0fe77838f0a73ca6ff618c3a7d630886fbfbbfd1de16749df4c1a6\n",
            "  Stored in directory: /root/.cache/pip/wheels/bd/65/9a/671fc6dcde07d4418df0c592f8df512b26d7a0029c2a23dd81\n",
            "  Building wheel for wavedrom (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "  Created wheel for wavedrom: filename=wavedrom-2.0.3.post3-py2.py3-none-any.whl size=30055 sha256=ce11c056d20a8700686f149f7ac7e3685fea0f80b2d57823e33678bc3b003348\n",
            "  Stored in directory: /root/.cache/pip/wheels/9c/52/8c/38b454b42f712f325e26f633287484c7dc1ad469e1580c5954\n",
            "Successfully built fire fschat ffmpy wavedrom\n",
            "Installing collected packages: pydub, nh3, ffmpy, addict, zstandard, xxhash, websockets, uvloop, urllib3, ujson, svgwrite, smmap, shtab, shortuuid, shellingham, setproctitle, semantic-version, python-multipart, python-dotenv, pynvml, pydantic-core, packaging, orjson, markdown2, jmespath, humanfriendly, httptools, hf_transfer, h11, fire, einops, docker-pycreds, dnspython, dill, colorama, art, aioitertools, aiofiles, wavedrom, watchfiles, uvicorn, starlette, sentry-sdk, pydantic, multiprocess, httpcore, gitdb, email_validator, coloredlogs, botocore, tyro, typer, tiktoken, responses, httpx, gitpython, aiobotocore, xformers, wandb, s3fs, gradio-client, fastapi-cli, datasets, bitsandbytes, accelerate, fastapi, evaluate, trl, peft, gradio, fschat, optimum, axolotl\n",
            "  Attempting uninstall: urllib3\n",
            "    Found existing installation: urllib3 2.0.7\n",
            "    Uninstalling urllib3-2.0.7:\n",
            "      Successfully uninstalled urllib3-2.0.7\n",
            "  Attempting uninstall: pydantic-core\n",
            "    Found existing installation: pydantic_core 2.18.3\n",
            "    Uninstalling pydantic_core-2.18.3:\n",
            "      Successfully uninstalled pydantic_core-2.18.3\n",
            "  Attempting uninstall: packaging\n",
            "    Found existing installation: packaging 24.0\n",
            "    Uninstalling packaging-24.0:\n",
            "      Successfully uninstalled packaging-24.0\n",
            "  Attempting uninstall: pydantic\n",
            "    Found existing installation: pydantic 2.7.2\n",
            "    Uninstalling pydantic-2.7.2:\n",
            "      Successfully uninstalled pydantic-2.7.2\n",
            "  Attempting uninstall: typer\n",
            "    Found existing installation: typer 0.9.4\n",
            "    Uninstalling typer-0.9.4:\n",
            "      Successfully uninstalled typer-0.9.4\n",
            "  Running setup.py develop for axolotl\n",
            "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
            "spacy 3.7.4 requires typer<0.10.0,>=0.3.0, but you have typer 0.12.3 which is incompatible.\n",
            "torchtext 0.18.0 requires torch>=2.3.0, but you have torch 2.1.2 which is incompatible.\n",
            "weasel 0.3.4 requires typer<0.10.0,>=0.3.0, but you have typer 0.12.3 which is incompatible.\u001b[0m\u001b[31m\n",
            "\u001b[0mSuccessfully installed accelerate-0.30.1 addict-2.4.0 aiobotocore-2.5.4 aiofiles-23.2.1 aioitertools-0.11.0 art-6.2 axolotl-0.4.1 bitsandbytes-0.43.1 botocore-1.31.17 colorama-0.4.6 coloredlogs-15.0.1 datasets-2.19.1 dill-0.3.8 dnspython-2.6.1 docker-pycreds-0.4.0 einops-0.8.0 email_validator-2.1.1 evaluate-0.4.1 fastapi-0.111.0 fastapi-cli-0.0.4 ffmpy-0.3.2 fire-0.6.0 fschat-0.2.36 gitdb-4.0.11 gitpython-3.1.43 gradio-3.50.2 gradio-client-0.6.1 h11-0.14.0 hf_transfer-0.1.6 httpcore-1.0.5 httptools-0.6.1 httpx-0.27.0 humanfriendly-10.0 jmespath-1.0.1 markdown2-2.4.13 multiprocess-0.70.16 nh3-0.2.17 optimum-1.16.2 orjson-3.10.3 packaging-23.2 peft-0.11.1 pydantic-2.6.3 pydantic-core-2.16.3 pydub-0.25.1 pynvml-11.5.0 python-dotenv-1.0.1 python-multipart-0.0.9 responses-0.18.0 s3fs-2023.6.0 semantic-version-2.10.0 sentry-sdk-2.3.1 setproctitle-1.3.3 shellingham-1.5.4 shortuuid-1.0.13 shtab-1.7.1 smmap-5.0.1 starlette-0.37.2 svgwrite-1.4.3 tiktoken-0.7.0 trl-0.8.6 typer-0.12.3 tyro-0.8.4 ujson-5.10.0 urllib3-1.26.18 uvicorn-0.30.1 uvloop-0.19.0 wandb-0.17.0 watchfiles-0.22.0 wavedrom-2.0.3.post3 websockets-11.0.3 xformers-0.0.23.post1 xxhash-3.4.1 zstandard-0.22.0\n",
            "Collecting deepspeed==0.13.1\n",
            "  Downloading deepspeed-0.13.1.tar.gz (1.3 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.3/1.3 MB\u001b[0m \u001b[31m9.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "Collecting hjson (from deepspeed==0.13.1)\n",
            "  Downloading hjson-3.1.0-py3-none-any.whl (54 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m54.0/54.0 kB\u001b[0m \u001b[31m8.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting ninja (from deepspeed==0.13.1)\n",
            "  Downloading ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl (307 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m307.2/307.2 kB\u001b[0m \u001b[31m13.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from deepspeed==0.13.1) (1.25.2)\n",
            "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from deepspeed==0.13.1) (23.2)\n",
            "Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from deepspeed==0.13.1) (5.9.5)\n",
            "Requirement already satisfied: py-cpuinfo in /usr/local/lib/python3.10/dist-packages (from deepspeed==0.13.1) (9.0.0)\n",
            "Requirement already satisfied: pydantic in /usr/local/lib/python3.10/dist-packages (from deepspeed==0.13.1) (2.6.3)\n",
            "Requirement already satisfied: pynvml in /usr/local/lib/python3.10/dist-packages (from deepspeed==0.13.1) (11.5.0)\n",
            "Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from deepspeed==0.13.1) (2.1.2)\n",
            "Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from deepspeed==0.13.1) (4.66.4)\n",
            "Requirement already satisfied: annotated-types>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from pydantic->deepspeed==0.13.1) (0.7.0)\n",
            "Requirement already satisfied: pydantic-core==2.16.3 in /usr/local/lib/python3.10/dist-packages (from pydantic->deepspeed==0.13.1) (2.16.3)\n",
            "Requirement already satisfied: typing-extensions>=4.6.1 in /usr/local/lib/python3.10/dist-packages (from pydantic->deepspeed==0.13.1) (4.12.0)\n",
            "Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch->deepspeed==0.13.1) (3.14.0)\n",
            "Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch->deepspeed==0.13.1) (1.12.1)\n",
            "Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch->deepspeed==0.13.1) (3.3)\n",
            "Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch->deepspeed==0.13.1) (3.1.4)\n",
            "Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch->deepspeed==0.13.1) (2023.6.0)\n",
            "Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch->deepspeed==0.13.1) (12.1.105)\n",
            "Requirement already satisfied: nvidia-cuda-runtime-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch->deepspeed==0.13.1) (12.1.105)\n",
            "Requirement already satisfied: nvidia-cuda-cupti-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch->deepspeed==0.13.1) (12.1.105)\n",
            "Requirement already satisfied: nvidia-cudnn-cu12==8.9.2.26 in /usr/local/lib/python3.10/dist-packages (from torch->deepspeed==0.13.1) (8.9.2.26)\n",
            "Requirement already satisfied: nvidia-cublas-cu12==12.1.3.1 in /usr/local/lib/python3.10/dist-packages (from torch->deepspeed==0.13.1) (12.1.3.1)\n",
            "Requirement already satisfied: nvidia-cufft-cu12==11.0.2.54 in /usr/local/lib/python3.10/dist-packages (from torch->deepspeed==0.13.1) (11.0.2.54)\n",
            "Requirement already satisfied: nvidia-curand-cu12==10.3.2.106 in /usr/local/lib/python3.10/dist-packages (from torch->deepspeed==0.13.1) (10.3.2.106)\n",
            "Requirement already satisfied: nvidia-cusolver-cu12==11.4.5.107 in /usr/local/lib/python3.10/dist-packages (from torch->deepspeed==0.13.1) (11.4.5.107)\n",
            "Requirement already satisfied: nvidia-cusparse-cu12==12.1.0.106 in /usr/local/lib/python3.10/dist-packages (from torch->deepspeed==0.13.1) (12.1.0.106)\n",
            "Requirement already satisfied: nvidia-nccl-cu12==2.18.1 in /usr/local/lib/python3.10/dist-packages (from torch->deepspeed==0.13.1) (2.18.1)\n",
            "Requirement already satisfied: nvidia-nvtx-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch->deepspeed==0.13.1) (12.1.105)\n",
            "Requirement already satisfied: triton==2.1.0 in /usr/local/lib/python3.10/dist-packages (from torch->deepspeed==0.13.1) (2.1.0)\n",
            "Requirement already satisfied: nvidia-nvjitlink-cu12 in /usr/local/lib/python3.10/dist-packages (from nvidia-cusolver-cu12==11.4.5.107->torch->deepspeed==0.13.1) (12.5.40)\n",
            "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch->deepspeed==0.13.1) (2.1.5)\n",
            "Requirement already satisfied: mpmath<1.4.0,>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from sympy->torch->deepspeed==0.13.1) (1.3.0)\n",
            "Building wheels for collected packages: deepspeed\n",
            "  Building wheel for deepspeed (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "  Created wheel for deepspeed: filename=deepspeed-0.13.1-py3-none-any.whl size=1350303 sha256=75a81b425c0aa8fffe1fadf01c85d094ead73199a59f87e704660d110f561103\n",
            "  Stored in directory: /root/.cache/pip/wheels/0f/fb/b5/b159b3500525eca167d8ca6e3a7e224b6075045cac90f47cf7\n",
            "Successfully built deepspeed\n",
            "Installing collected packages: ninja, hjson, deepspeed\n",
            "Successfully installed deepspeed-0.13.1 hjson-3.1.0 ninja-1.11.1.1\n",
            "Collecting mlflow==2.13.0\n",
            "  Downloading mlflow-2.13.0-py3-none-any.whl (25.0 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m25.0/25.0 MB\u001b[0m \u001b[31m56.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: Flask<4 in /usr/local/lib/python3.10/dist-packages (from mlflow==2.13.0) (2.2.5)\n",
            "Collecting alembic!=1.10.0,<2 (from mlflow==2.13.0)\n",
            "  Downloading alembic-1.13.1-py3-none-any.whl (233 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m233.4/233.4 kB\u001b[0m \u001b[31m31.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: cachetools<6,>=5.0.0 in /usr/local/lib/python3.10/dist-packages (from mlflow==2.13.0) (5.3.3)\n",
            "Requirement already satisfied: click<9,>=7.0 in /usr/local/lib/python3.10/dist-packages (from mlflow==2.13.0) (8.1.7)\n",
            "Requirement already satisfied: cloudpickle<4 in /usr/local/lib/python3.10/dist-packages (from mlflow==2.13.0) (2.2.1)\n",
            "Collecting docker<8,>=4.0.0 (from mlflow==2.13.0)\n",
            "  Downloading docker-7.1.0-py3-none-any.whl (147 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m147.8/147.8 kB\u001b[0m \u001b[31m22.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: entrypoints<1 in /usr/local/lib/python3.10/dist-packages (from mlflow==2.13.0) (0.4)\n",
            "Requirement already satisfied: gitpython<4,>=3.1.9 in /usr/local/lib/python3.10/dist-packages (from mlflow==2.13.0) (3.1.43)\n",
            "Collecting graphene<4 (from mlflow==2.13.0)\n",
            "  Downloading graphene-3.3-py2.py3-none-any.whl (128 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m128.2/128.2 kB\u001b[0m \u001b[31m22.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: importlib-metadata!=4.7.0,<8,>=3.7.0 in /usr/local/lib/python3.10/dist-packages (from mlflow==2.13.0) (7.1.0)\n",
            "Requirement already satisfied: markdown<4,>=3.3 in /usr/local/lib/python3.10/dist-packages (from mlflow==2.13.0) (3.6)\n",
            "Requirement already satisfied: matplotlib<4 in /usr/local/lib/python3.10/dist-packages (from mlflow==2.13.0) (3.7.1)\n",
            "Requirement already satisfied: numpy<2 in /usr/local/lib/python3.10/dist-packages (from mlflow==2.13.0) (1.25.2)\n",
            "Collecting opentelemetry-api<3,>=1.0.0 (from mlflow==2.13.0)\n",
            "  Downloading opentelemetry_api-1.25.0-py3-none-any.whl (59 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m59.9/59.9 kB\u001b[0m \u001b[31m9.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting opentelemetry-sdk<3,>=1.0.0 (from mlflow==2.13.0)\n",
            "  Downloading opentelemetry_sdk-1.25.0-py3-none-any.whl (107 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m107.0/107.0 kB\u001b[0m \u001b[31m16.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: packaging<25 in /usr/local/lib/python3.10/dist-packages (from mlflow==2.13.0) (23.2)\n",
            "Requirement already satisfied: pandas<3 in /usr/local/lib/python3.10/dist-packages (from mlflow==2.13.0) (2.0.3)\n",
            "Requirement already satisfied: protobuf<5,>=3.12.0 in /usr/local/lib/python3.10/dist-packages (from mlflow==2.13.0) (3.20.3)\n",
            "Requirement already satisfied: pyarrow<16,>=4.0.0 in /usr/local/lib/python3.10/dist-packages (from mlflow==2.13.0) (14.0.2)\n",
            "Requirement already satisfied: pytz<2025 in /usr/local/lib/python3.10/dist-packages (from mlflow==2.13.0) (2023.4)\n",
            "Requirement already satisfied: pyyaml<7,>=5.1 in /usr/local/lib/python3.10/dist-packages (from mlflow==2.13.0) (6.0.1)\n",
            "Collecting querystring-parser<2 (from mlflow==2.13.0)\n",
            "  Downloading querystring_parser-1.2.4-py2.py3-none-any.whl (7.9 kB)\n",
            "Requirement already satisfied: requests<3,>=2.17.3 in /usr/local/lib/python3.10/dist-packages (from mlflow==2.13.0) (2.31.0)\n",
            "Requirement already satisfied: scikit-learn<2 in /usr/local/lib/python3.10/dist-packages (from mlflow==2.13.0) (1.2.2)\n",
            "Requirement already satisfied: scipy<2 in /usr/local/lib/python3.10/dist-packages (from mlflow==2.13.0) (1.11.4)\n",
            "Requirement already satisfied: sqlalchemy<3,>=1.4.0 in /usr/local/lib/python3.10/dist-packages (from mlflow==2.13.0) (2.0.30)\n",
            "Requirement already satisfied: sqlparse<1,>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from mlflow==2.13.0) (0.5.0)\n",
            "Requirement already satisfied: Jinja2<4,>=2.11 in /usr/local/lib/python3.10/dist-packages (from mlflow==2.13.0) (3.1.4)\n",
            "Collecting gunicorn<23 (from mlflow==2.13.0)\n",
            "  Downloading gunicorn-22.0.0-py3-none-any.whl (84 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m84.4/84.4 kB\u001b[0m \u001b[31m14.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting Mako (from alembic!=1.10.0,<2->mlflow==2.13.0)\n",
            "  Downloading Mako-1.3.5-py3-none-any.whl (78 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m78.6/78.6 kB\u001b[0m \u001b[31m14.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: typing-extensions>=4 in /usr/local/lib/python3.10/dist-packages (from alembic!=1.10.0,<2->mlflow==2.13.0) (4.12.0)\n",
            "Requirement already satisfied: urllib3>=1.26.0 in /usr/local/lib/python3.10/dist-packages (from docker<8,>=4.0.0->mlflow==2.13.0) (1.26.18)\n",
            "Requirement already satisfied: Werkzeug>=2.2.2 in /usr/local/lib/python3.10/dist-packages (from Flask<4->mlflow==2.13.0) (3.0.3)\n",
            "Requirement already satisfied: itsdangerous>=2.0 in /usr/local/lib/python3.10/dist-packages (from Flask<4->mlflow==2.13.0) (2.2.0)\n",
            "Requirement already satisfied: gitdb<5,>=4.0.1 in /usr/local/lib/python3.10/dist-packages (from gitpython<4,>=3.1.9->mlflow==2.13.0) (4.0.11)\n",
            "Collecting graphql-core<3.3,>=3.1 (from graphene<4->mlflow==2.13.0)\n",
            "  Downloading graphql_core-3.2.3-py3-none-any.whl (202 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m202.9/202.9 kB\u001b[0m \u001b[31m30.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting graphql-relay<3.3,>=3.1 (from graphene<4->mlflow==2.13.0)\n",
            "  Downloading graphql_relay-3.2.0-py3-none-any.whl (16 kB)\n",
            "Collecting aniso8601<10,>=8 (from graphene<4->mlflow==2.13.0)\n",
            "  Downloading aniso8601-9.0.1-py2.py3-none-any.whl (52 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m52.8/52.8 kB\u001b[0m \u001b[31m8.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.10/dist-packages (from importlib-metadata!=4.7.0,<8,>=3.7.0->mlflow==2.13.0) (3.19.0)\n",
            "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from Jinja2<4,>=2.11->mlflow==2.13.0) (2.1.5)\n",
            "Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib<4->mlflow==2.13.0) (1.2.1)\n",
            "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib<4->mlflow==2.13.0) (0.12.1)\n",
            "Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib<4->mlflow==2.13.0) (4.52.4)\n",
            "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib<4->mlflow==2.13.0) (1.4.5)\n",
            "Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib<4->mlflow==2.13.0) (9.4.0)\n",
            "Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib<4->mlflow==2.13.0) (3.1.2)\n",
            "Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib<4->mlflow==2.13.0) (2.8.2)\n",
            "Collecting deprecated>=1.2.6 (from opentelemetry-api<3,>=1.0.0->mlflow==2.13.0)\n",
            "  Downloading Deprecated-1.2.14-py2.py3-none-any.whl (9.6 kB)\n",
            "Collecting opentelemetry-semantic-conventions==0.46b0 (from opentelemetry-sdk<3,>=1.0.0->mlflow==2.13.0)\n",
            "  Downloading opentelemetry_semantic_conventions-0.46b0-py3-none-any.whl (130 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m130.5/130.5 kB\u001b[0m \u001b[31m21.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas<3->mlflow==2.13.0) (2024.1)\n",
            "Requirement already satisfied: six in /usr/local/lib/python3.10/dist-packages (from querystring-parser<2->mlflow==2.13.0) (1.16.0)\n",
            "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2.17.3->mlflow==2.13.0) (3.3.2)\n",
            "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2.17.3->mlflow==2.13.0) (3.7)\n",
            "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2.17.3->mlflow==2.13.0) (2024.2.2)\n",
            "Requirement already satisfied: joblib>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from scikit-learn<2->mlflow==2.13.0) (1.4.2)\n",
            "Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn<2->mlflow==2.13.0) (3.5.0)\n",
            "Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.10/dist-packages (from sqlalchemy<3,>=1.4.0->mlflow==2.13.0) (3.0.3)\n",
            "Requirement already satisfied: wrapt<2,>=1.10 in /usr/local/lib/python3.10/dist-packages (from deprecated>=1.2.6->opentelemetry-api<3,>=1.0.0->mlflow==2.13.0) (1.14.1)\n",
            "Requirement already satisfied: smmap<6,>=3.0.1 in /usr/local/lib/python3.10/dist-packages (from gitdb<5,>=4.0.1->gitpython<4,>=3.1.9->mlflow==2.13.0) (5.0.1)\n",
            "Installing collected packages: aniso8601, querystring-parser, Mako, gunicorn, graphql-core, deprecated, opentelemetry-api, graphql-relay, docker, alembic, opentelemetry-semantic-conventions, graphene, opentelemetry-sdk, mlflow\n",
            "Successfully installed Mako-1.3.5 alembic-1.13.1 aniso8601-9.0.1 deprecated-1.2.14 docker-7.1.0 graphene-3.3 graphql-core-3.2.3 graphql-relay-3.2.0 gunicorn-22.0.0 mlflow-2.13.0 opentelemetry-api-1.25.0 opentelemetry-sdk-1.25.0 opentelemetry-semantic-conventions-0.46b0 querystring-parser-1.2.4\n"
          ]
        }
      ],
      "source": [
        "!pip install -e git+https://github.com/OpenAccess-AI-Collective/axolotl#egg=axolotl\n",
        "# T4 does not support flash attention\n",
        "# !pip install flash-attn==\"2.5.0\"\n",
        "!pip install deepspeed==\"0.13.1\"\n",
        "!pip install mlflow==\"2.13.0\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Di9D2DY5dqmw"
      },
      "source": [
        "## Finetune Gemma\n",
        "\n",
        "Axolotl uses YAML config files to specify finetuning parameters. The YAML file below is adapted from the official [Gemma QLoRA example](https://github.com/OpenAccess-AI-Collective/axolotl/blob/main/examples/gemma/qlora.yml).\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {
        "id": "---NjyBnIPOR"
      },
      "outputs": [],
      "source": [
        "import yaml\n",
        "\n",
        "# Your YAML string\n",
        "yaml_string = \"\"\"\n",
        "base_model: google/gemma-2b\n",
        "model_type: AutoModelForCausalLM\n",
        "tokenizer_type: AutoTokenizer\n",
        "\n",
        "load_in_8bit: false\n",
        "load_in_4bit: true\n",
        "strict: false\n",
        "\n",
        "# huggingface repo\n",
        "datasets:\n",
        "  - path: mhenrichsen/alpaca_2k_test\n",
        "    type: alpaca\n",
        "val_set_size: 0.1\n",
        "output_dir: ./outputs/out\n",
        "\n",
        "adapter: qlora\n",
        "lora_r: 4\n",
        "lora_alpha: 4\n",
        "lora_dropout: 0.05\n",
        "lora_target_linear: true\n",
        "\n",
        "sequence_len: 2048\n",
        "sample_packing: true\n",
        "eval_sample_packing: false\n",
        "pad_to_sequence_len: true\n",
        "\n",
        "wandb_project:\n",
        "wandb_entity:\n",
        "wandb_watch:\n",
        "wandb_name:\n",
        "wandb_log_model:\n",
        "\n",
        "\n",
        "gradient_accumulation_steps: 3\n",
        "micro_batch_size: 1\n",
        "num_epochs: 1\n",
        "optimizer: adamw_bnb_8bit\n",
        "lr_scheduler: cosine\n",
        "learning_rate: 0.0002\n",
        "\n",
        "train_on_inputs: false\n",
        "group_by_length: false\n",
        "# T4 does not support BF16\n",
        "bf16: false\n",
        "fp16:\n",
        "tf32: false\n",
        "\n",
        "gradient_checkpointing: true\n",
        "early_stopping_patience:\n",
        "resume_from_checkpoint:\n",
        "local_rank:\n",
        "logging_steps: 1\n",
        "xformers_attention:\n",
        "# T4 does not support flash attention\n",
        "flash_attention: false\n",
        "\n",
        "warmup_ratio: 0.1\n",
        "evals_per_epoch: 4\n",
        "eval_table_size:\n",
        "eval_max_new_tokens: 128\n",
        "saves_per_epoch: 1\n",
        "debug:\n",
        "deepspeed:\n",
        "weight_decay: 0.0\n",
        "fsdp:\n",
        "fsdp_config:\n",
        "special_tokens:\n",
        "\n",
        "\"\"\"\n",
        "\n",
        "# Convert the YAML string to a Python dictionary\n",
        "yaml_dict = yaml.safe_load(yaml_string)\n",
        "\n",
        "# Specify your file path\n",
        "file_path = \"gemma_axolotl.yaml\"\n",
        "\n",
        "# Write the YAML file\n",
        "with open(file_path, \"w\") as file:\n",
        "    yaml.dump(yaml_dict, file)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "AzXzSrR-InzJ"
      },
      "source": [
        "### Kick off finetuning\n",
        "\n",
        "### Gemma setup on Hugging Face\n",
        "Axolotl uses Hugging Face under the hood. So you will need to:\n",
        "\n",
        "* Get access to Gemma on [huggingface.co](huggingface.co) by accepting the Gemma license on the Hugging Face page of the specific model, i.e., [Gemma 2B](https://huggingface.co/google/gemma-2b).\n",
        "* Generate a [Hugging Face access token](https://huggingface.co/docs/hub/en/security-tokens) and configure it as a Colab secret 'HF_TOKEN'."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {
        "id": "AVvJYwne3hha"
      },
      "outputs": [],
      "source": [
        "import os\n",
        "from google.colab import userdata\n",
        "# Note: `userdata.get` is a Colab API. If you're not using Colab, set the env\n",
        "# vars as appropriate for your system.\n",
        "os.environ[\"HF_TOKEN\"] = userdata.get(\"HF_TOKEN\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "sXm65eC5eR4n"
      },
      "source": [
        "Now kick off finetuning."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {
        "id": "2pGX3hLubhkJ"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "[2024-06-04 04:29:36,530] [INFO] [numexpr.utils._init_num_threads:161] [PID:3985] NumExpr defaulting to 8 threads.\n",
            "[2024-06-04 04:29:36,731] [INFO] [datasets.<module>:58] [PID:3985] PyTorch version 2.1.2 available.\n",
            "[2024-06-04 04:29:36,733] [INFO] [datasets.<module>:70] [PID:3985] Polars version 0.20.2 available.\n",
            "[2024-06-04 04:29:36,733] [INFO] [datasets.<module>:105] [PID:3985] TensorFlow version 2.15.0 available.\n",
            "[2024-06-04 04:29:36,734] [INFO] [datasets.<module>:118] [PID:3985] JAX version 0.4.26 available.\n",
            "2024-06-04 04:29:38.359720: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
            "2024-06-04 04:29:38.359772: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
            "2024-06-04 04:29:38.361162: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
            "2024-06-04 04:29:38.368328: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n",
            "To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n",
            "2024-06-04 04:29:39.353189: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n",
            "[2024-06-04 04:29:40,618] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n",
            "\u001b[33m[2024-06-04 04:29:43,136] [WARNING] [axolotl.utils.config.models.input.check_sample_packing_wo_flash:730] [PID:3985] [RANK:0] sample_packing without flash_attention or sdp_attention does not handle cross-attention.\u001b[39m\n",
            "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
            "  warnings.warn(\n",
            "config.json: 100% 627/627 [00:00<00:00, 4.79MB/s]\n",
            "[2024-06-04 04:29:43,548] [INFO] [axolotl.normalize_config:182] [PID:3985] [RANK:0] GPU memory usage baseline: 0.000GB (+0.255GB misc)\u001b[39m\n",
            "                                 dP            dP   dP \n",
            "                                 88            88   88 \n",
            "      .d8888b. dP.  .dP .d8888b. 88 .d8888b. d8888P 88 \n",
            "      88'  `88  `8bd8'  88'  `88 88 88'  `88   88   88 \n",
            "      88.  .88  .d88b.  88.  .88 88 88.  .88   88   88 \n",
            "      `88888P8 dP'  `dP `88888P' dP `88888P'   dP   dP \n",
            "                                                       \n",
            "                                                       \n",
            "\n",
            "****************************************\n",
            "**** Axolotl Dependency Versions *****\n",
            "  accelerate: 0.30.1         \n",
            "        peft: 0.11.1         \n",
            "transformers: 4.41.1         \n",
            "         trl: 0.8.6          \n",
            "       torch: 2.1.2          \n",
            "bitsandbytes: 0.43.1         \n",
            "****************************************\n",
            "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
            "  warnings.warn(\n",
            "tokenizer_config.json: 100% 33.6k/33.6k [00:00<00:00, 4.50MB/s]\n",
            "tokenizer.model: 100% 4.24M/4.24M [00:00<00:00, 60.3MB/s]\n",
            "tokenizer.json: 100% 17.5M/17.5M [00:00<00:00, 263MB/s]\n",
            "special_tokens_map.json: 100% 636/636 [00:00<00:00, 7.02MB/s]\n",
            "[2024-06-04 04:29:45,781] [DEBUG] [axolotl.load_tokenizer:280] [PID:3985] [RANK:0] EOS: 1 / <eos>\u001b[39m\n",
            "[2024-06-04 04:29:45,781] [DEBUG] [axolotl.load_tokenizer:281] [PID:3985] [RANK:0] BOS: 2 / <bos>\u001b[39m\n",
            "[2024-06-04 04:29:45,781] [DEBUG] [axolotl.load_tokenizer:282] [PID:3985] [RANK:0] PAD: 0 / <pad>\u001b[39m\n",
            "[2024-06-04 04:29:45,781] [DEBUG] [axolotl.load_tokenizer:283] [PID:3985] [RANK:0] UNK: 3 / <unk>\u001b[39m\n",
            "[2024-06-04 04:29:45,781] [INFO] [axolotl.load_tokenizer:294] [PID:3985] [RANK:0] No Chat template selected. Consider adding a chat template for easier inference.\u001b[39m\n",
            "[2024-06-04 04:29:45,781] [INFO] [axolotl.load_tokenized_prepared_datasets:183] [PID:3985] [RANK:0] Unable to find prepared dataset in last_run_prepared/eab937609da4aad53dd78ca795825242\u001b[39m\n",
            "[2024-06-04 04:29:45,781] [INFO] [axolotl.load_tokenized_prepared_datasets:184] [PID:3985] [RANK:0] Loading raw datasets...\u001b[39m\n",
            "\u001b[33m[2024-06-04 04:29:45,781] [WARNING] [axolotl.load_tokenized_prepared_datasets:186] [PID:3985] [RANK:0] Processing datasets during training can lead to VRAM instability. Please pre-process your dataset.\u001b[39m\n",
            "[2024-06-04 04:29:45,781] [INFO] [axolotl.load_tokenized_prepared_datasets:193] [PID:3985] [RANK:0] No seed provided, using default seed of 42\u001b[39m\n",
            "Downloading readme: 100% 28.0/28.0 [00:00<00:00, 245kB/s]\n",
            "Downloading data: 100% 1.76M/1.76M [00:00<00:00, 3.67MB/s]\n",
            "Generating train split: 100% 2000/2000 [00:00<00:00, 66845.23 examples/s]\n",
            "/usr/local/lib/python3.10/dist-packages/multiprocess/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.\n",
            "  self.pid = os.fork()\n",
            "Tokenizing Prompts (num_proc=8): 100% 2000/2000 [00:04<00:00, 401.09 examples/s]\n",
            "[2024-06-04 04:29:57,528] [INFO] [axolotl.load_tokenized_prepared_datasets:414] [PID:3985] [RANK:0] merging datasets\u001b[39m\n",
            "Dropping Long Sequences (num_proc=8): 100% 2000/2000 [00:00<00:00, 6958.21 examples/s]\n",
            "Add position_id column (Sample Packing) (num_proc=8): 100% 2000/2000 [00:00<00:00, 6215.27 examples/s]\n",
            "[2024-06-04 04:29:58,512] [INFO] [axolotl.load_tokenized_prepared_datasets:427] [PID:3985] [RANK:0] Saving merged prepared dataset to disk... last_run_prepared/eab937609da4aad53dd78ca795825242\u001b[39m\n",
            "Saving the dataset (1/1 shards): 100% 2000/2000 [00:00<00:00, 92869.33 examples/s]\n",
            "[2024-06-04 04:29:58,548] [DEBUG] [axolotl.calculate_total_num_steps:299] [PID:3985] [RANK:0] total_num_tokens: 337_140\u001b[39m\n",
            "[2024-06-04 04:29:58,568] [DEBUG] [axolotl.calculate_total_num_steps:312] [PID:3985] [RANK:0] `total_supervised_tokens: 242_720`\u001b[39m\n",
            "[2024-06-04 04:30:03,986] [INFO] [axolotl.utils.samplers.multipack._len_est:185] [PID:3985] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 337140\u001b[39m\n",
            "[2024-06-04 04:30:03,986] [DEBUG] [axolotl.calculate_total_num_steps:364] [PID:3985] [RANK:0] data_loader_len: 53\u001b[39m\n",
            "[2024-06-04 04:30:03,986] [INFO] [axolotl.calc_sample_packing_eff_est:370] [PID:3985] [RANK:0] sample_packing_eff_est across ranks: [0.9300516419491526]\u001b[39m\n",
            "[2024-06-04 04:30:03,986] [DEBUG] [axolotl.calculate_total_num_steps:382] [PID:3985] [RANK:0] sample_packing_eff_est: 0.94\u001b[39m\n",
            "[2024-06-04 04:30:03,986] [DEBUG] [axolotl.calculate_total_num_steps:390] [PID:3985] [RANK:0] total_num_steps: 53\u001b[39m\n",
            "[2024-06-04 04:30:03,986] [DEBUG] [axolotl.train.train:56] [PID:3985] [RANK:0] loading tokenizer... google/gemma-2b\u001b[39m\n",
            "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
            "  warnings.warn(\n",
            "[2024-06-04 04:30:04,948] [DEBUG] [axolotl.load_tokenizer:280] [PID:3985] [RANK:0] EOS: 1 / <eos>\u001b[39m\n",
            "[2024-06-04 04:30:04,949] [DEBUG] [axolotl.load_tokenizer:281] [PID:3985] [RANK:0] BOS: 2 / <bos>\u001b[39m\n",
            "[2024-06-04 04:30:04,949] [DEBUG] [axolotl.load_tokenizer:282] [PID:3985] [RANK:0] PAD: 0 / <pad>\u001b[39m\n",
            "[2024-06-04 04:30:04,949] [DEBUG] [axolotl.load_tokenizer:283] [PID:3985] [RANK:0] UNK: 3 / <unk>\u001b[39m\n",
            "[2024-06-04 04:30:04,949] [INFO] [axolotl.load_tokenizer:294] [PID:3985] [RANK:0] No Chat template selected. Consider adding a chat template for easier inference.\u001b[39m\n",
            "[2024-06-04 04:30:04,949] [DEBUG] [axolotl.train.train:85] [PID:3985] [RANK:0] loading model and peft_config...\u001b[39m\n",
            "model.safetensors.index.json: 100% 13.5k/13.5k [00:00<00:00, 87.4MB/s]\n",
            "Downloading shards:   0% 0/2 [00:00<?, ?it/s]\n",
            "model-00001-of-00002.safetensors:   0% 0.00/4.95G [00:00<?, ?B/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:   0% 10.5M/4.95G [00:00<01:54, 42.9MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:   1% 41.9M/4.95G [00:00<00:35, 138MB/s] \u001b[A\n",
            "model-00001-of-00002.safetensors:   1% 73.4M/4.95G [00:00<00:25, 189MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:   2% 105M/4.95G [00:00<00:22, 214MB/s] \u001b[A\n",
            "model-00001-of-00002.safetensors:   3% 136M/4.95G [00:00<00:21, 227MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:   3% 168M/4.95G [00:00<00:19, 245MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:   4% 199M/4.95G [00:00<00:18, 257MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:   5% 231M/4.95G [00:01<00:17, 265MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:   5% 262M/4.95G [00:01<00:17, 272MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:   6% 294M/4.95G [00:01<00:17, 272MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:   7% 325M/4.95G [00:01<00:16, 276MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:   7% 357M/4.95G [00:01<00:16, 275MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:   8% 388M/4.95G [00:01<00:16, 273MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:   8% 419M/4.95G [00:01<00:16, 277MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:   9% 451M/4.95G [00:01<00:16, 280MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  10% 482M/4.95G [00:01<00:16, 275MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  10% 514M/4.95G [00:02<00:16, 268MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  11% 545M/4.95G [00:02<00:17, 257MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  12% 577M/4.95G [00:02<00:16, 259MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  12% 608M/4.95G [00:02<00:17, 252MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  13% 640M/4.95G [00:02<00:17, 250MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  14% 671M/4.95G [00:02<00:17, 244MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  14% 703M/4.95G [00:02<00:17, 240MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  15% 734M/4.95G [00:03<00:17, 235MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  15% 765M/4.95G [00:03<00:17, 233MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  16% 797M/4.95G [00:03<00:17, 233MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  17% 828M/4.95G [00:03<00:17, 229MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  17% 860M/4.95G [00:03<00:17, 229MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  18% 891M/4.95G [00:03<00:17, 232MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  19% 923M/4.95G [00:03<00:16, 239MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  19% 954M/4.95G [00:03<00:16, 238MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  20% 986M/4.95G [00:04<00:16, 241MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  21% 1.02G/4.95G [00:04<00:16, 240MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  21% 1.05G/4.95G [00:04<00:16, 240MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  22% 1.08G/4.95G [00:04<00:15, 243MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  22% 1.11G/4.95G [00:04<00:15, 246MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  23% 1.14G/4.95G [00:04<00:15, 248MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  24% 1.17G/4.95G [00:04<00:15, 251MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  24% 1.21G/4.95G [00:04<00:14, 254MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  25% 1.24G/4.95G [00:05<00:14, 260MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  26% 1.27G/4.95G [00:05<00:13, 263MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  26% 1.30G/4.95G [00:05<00:13, 267MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  27% 1.33G/4.95G [00:05<00:13, 268MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  28% 1.36G/4.95G [00:05<00:13, 264MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  28% 1.39G/4.95G [00:05<00:13, 258MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  29% 1.43G/4.95G [00:05<00:13, 262MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  29% 1.46G/4.95G [00:05<00:13, 266MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  30% 1.49G/4.95G [00:06<00:12, 269MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  31% 1.52G/4.95G [00:06<00:12, 271MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  31% 1.55G/4.95G [00:06<00:12, 272MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  32% 1.58G/4.95G [00:06<00:12, 267MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  33% 1.61G/4.95G [00:06<00:12, 265MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  33% 1.65G/4.95G [00:06<00:12, 258MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  34% 1.68G/4.95G [00:06<00:12, 257MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  35% 1.71G/4.95G [00:06<00:12, 256MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  35% 1.74G/4.95G [00:06<00:12, 251MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  36% 1.77G/4.95G [00:07<00:12, 248MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  36% 1.80G/4.95G [00:07<00:12, 245MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  37% 1.84G/4.95G [00:07<00:12, 242MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  38% 1.87G/4.95G [00:07<00:12, 241MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  38% 1.90G/4.95G [00:07<00:12, 240MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  39% 1.93G/4.95G [00:07<00:12, 238MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  40% 1.96G/4.95G [00:07<00:12, 236MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  40% 1.99G/4.95G [00:08<00:12, 236MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  41% 2.02G/4.95G [00:08<00:11, 246MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  42% 2.06G/4.95G [00:08<00:11, 245MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  42% 2.09G/4.95G [00:08<00:11, 251MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  43% 2.12G/4.95G [00:08<00:11, 250MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  43% 2.15G/4.95G [00:08<00:10, 254MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  44% 2.18G/4.95G [00:08<00:10, 263MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  45% 2.21G/4.95G [00:08<00:10, 269MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  45% 2.24G/4.95G [00:08<00:09, 272MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  46% 2.28G/4.95G [00:09<00:09, 270MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  47% 2.31G/4.95G [00:09<00:09, 269MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  47% 2.34G/4.95G [00:09<00:09, 271MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  48% 2.37G/4.95G [00:09<00:09, 271MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  49% 2.40G/4.95G [00:09<00:09, 270MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  49% 2.43G/4.95G [00:09<00:09, 268MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  50% 2.46G/4.95G [00:09<00:09, 271MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  50% 2.50G/4.95G [00:09<00:09, 270MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  51% 2.53G/4.95G [00:10<00:09, 268MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  52% 2.56G/4.95G [00:10<00:08, 268MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  52% 2.59G/4.95G [00:10<00:08, 267MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  53% 2.62G/4.95G [00:10<00:08, 273MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  54% 2.65G/4.95G [00:10<00:08, 271MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  54% 2.68G/4.95G [00:10<00:08, 270MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  55% 2.72G/4.95G [00:10<00:08, 267MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  56% 2.75G/4.95G [00:10<00:08, 265MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  56% 2.78G/4.95G [00:10<00:08, 263MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  57% 2.81G/4.95G [00:11<00:08, 247MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  57% 2.84G/4.95G [00:11<00:09, 217MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  58% 2.87G/4.95G [00:11<00:09, 220MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  59% 2.90G/4.95G [00:11<00:09, 227MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  59% 2.94G/4.95G [00:11<00:08, 233MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  60% 2.97G/4.95G [00:11<00:08, 235MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  61% 3.00G/4.95G [00:11<00:08, 238MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  61% 3.03G/4.95G [00:12<00:08, 239MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  62% 3.06G/4.95G [00:12<00:07, 240MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  63% 3.09G/4.95G [00:12<00:07, 239MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  63% 3.12G/4.95G [00:12<00:08, 220MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  64% 3.16G/4.95G [00:12<00:09, 194MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  64% 3.18G/4.95G [00:12<00:08, 197MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  65% 3.21G/4.95G [00:12<00:08, 208MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  66% 3.24G/4.95G [00:13<00:07, 221MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  66% 3.27G/4.95G [00:13<00:07, 231MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  67% 3.30G/4.95G [00:13<00:06, 243MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  67% 3.33G/4.95G [00:13<00:06, 254MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  68% 3.37G/4.95G [00:13<00:06, 256MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  69% 3.40G/4.95G [00:13<00:05, 259MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  69% 3.43G/4.95G [00:13<00:05, 259MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  70% 3.46G/4.95G [00:13<00:05, 260MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  71% 3.49G/4.95G [00:14<00:05, 262MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  71% 3.52G/4.95G [00:14<00:05, 267MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  72% 3.55G/4.95G [00:14<00:05, 266MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  73% 3.59G/4.95G [00:14<00:05, 265MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  73% 3.62G/4.95G [00:14<00:05, 262MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  74% 3.65G/4.95G [00:14<00:05, 259MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  74% 3.68G/4.95G [00:14<00:04, 257MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  75% 3.71G/4.95G [00:14<00:04, 265MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  76% 3.74G/4.95G [00:15<00:04, 259MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  76% 3.77G/4.95G [00:15<00:04, 259MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  77% 3.81G/4.95G [00:15<00:04, 260MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  78% 3.84G/4.95G [00:15<00:04, 257MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  78% 3.87G/4.95G [00:15<00:04, 253MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  79% 3.90G/4.95G [00:15<00:04, 254MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  80% 3.93G/4.95G [00:15<00:03, 257MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  80% 3.96G/4.95G [00:15<00:03, 253MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  81% 4.00G/4.95G [00:16<00:03, 249MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  81% 4.03G/4.95G [00:16<00:03, 247MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  82% 4.06G/4.95G [00:16<00:03, 244MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  83% 4.09G/4.95G [00:16<00:03, 244MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  83% 4.12G/4.95G [00:16<00:03, 242MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  84% 4.15G/4.95G [00:16<00:03, 242MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  85% 4.18G/4.95G [00:16<00:03, 242MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  85% 4.22G/4.95G [00:16<00:02, 245MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  86% 4.25G/4.95G [00:17<00:02, 246MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  87% 4.28G/4.95G [00:17<00:02, 245MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  87% 4.31G/4.95G [00:17<00:02, 245MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  88% 4.34G/4.95G [00:17<00:02, 246MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  88% 4.37G/4.95G [00:17<00:02, 245MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  89% 4.40G/4.95G [00:17<00:02, 244MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  90% 4.44G/4.95G [00:17<00:02, 245MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  90% 4.47G/4.95G [00:17<00:01, 246MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  91% 4.50G/4.95G [00:18<00:01, 250MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  92% 4.53G/4.95G [00:18<00:01, 255MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  92% 4.56G/4.95G [00:18<00:01, 256MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  93% 4.59G/4.95G [00:18<00:01, 262MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  94% 4.62G/4.95G [00:18<00:01, 268MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  94% 4.66G/4.95G [00:18<00:01, 266MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  95% 4.69G/4.95G [00:18<00:00, 263MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  95% 4.72G/4.95G [00:18<00:00, 263MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  96% 4.75G/4.95G [00:19<00:00, 265MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  97% 4.78G/4.95G [00:19<00:00, 265MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  97% 4.81G/4.95G [00:19<00:00, 265MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  98% 4.84G/4.95G [00:19<00:00, 265MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  99% 4.88G/4.95G [00:19<00:00, 270MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors:  99% 4.91G/4.95G [00:19<00:00, 276MB/s]\u001b[A\n",
            "model-00001-of-00002.safetensors: 100% 4.95G/4.95G [00:19<00:00, 251MB/s]\n",
            "Downloading shards:  50% 1/2 [00:19<00:19, 19.93s/it]\n",
            "model-00002-of-00002.safetensors:   0% 0.00/67.1M [00:00<?, ?B/s]\u001b[A\n",
            "model-00002-of-00002.safetensors:  47% 31.5M/67.1M [00:00<00:00, 291MB/s]\u001b[A\n",
            "model-00002-of-00002.safetensors: 100% 67.1M/67.1M [00:00<00:00, 259MB/s]\n",
            "Downloading shards: 100% 2/2 [00:20<00:00, 10.20s/it]\n",
            "`config.hidden_act` is ignored, you should use `config.hidden_activation` instead.\n",
            "Gemma's activation function will be set to `gelu_pytorch_tanh`. Please, use\n",
            "`config.hidden_activation` if you want to override this behaviour.\n",
            "See https://github.com/huggingface/transformers/pull/29402 for more details.\n",
            "[2024-06-04 04:30:26,142] [INFO] [accelerate.utils.modeling.get_balanced_memory:989] [PID:3985] We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).\n",
            "Loading checkpoint shards: 100% 2/2 [00:05<00:00,  2.76s/it]\n",
            "generation_config.json: 100% 137/137 [00:00<00:00, 944kB/s]\n",
            "[2024-06-04 04:30:32,083] [INFO] [axolotl.load_model:734] [PID:3985] [RANK:0] GPU memory usage after model load: 2.906GB (+0.004GB cache, +0.368GB misc)\u001b[39m\n",
            "[2024-06-04 04:30:32,086] [INFO] [axolotl.load_model:785] [PID:3985] [RANK:0] converting PEFT model w/ prepare_model_for_kbit_training\u001b[39m\n",
            "[2024-06-04 04:30:32,088] [INFO] [axolotl.load_model:794] [PID:3985] [RANK:0] converting modules to torch.float32 for flash attention\u001b[39m\n",
            "[2024-06-04 04:30:32,090] [INFO] [axolotl.load_lora:951] [PID:3985] [RANK:0] found linear modules: ['up_proj', 'down_proj', 'k_proj', 'gate_proj', 'q_proj', 'v_proj', 'o_proj']\u001b[39m\n",
            "trainable params: 4,902,912 || all params: 2,511,075,328 || trainable%: 0.1953\n",
            "[2024-06-04 04:30:32,310] [INFO] [axolotl.load_model:843] [PID:3985] [RANK:0] GPU memory usage after adapters: 2.915GB (+0.005GB cache, +0.368GB misc)\u001b[39m\n",
            "/usr/local/lib/python3.10/dist-packages/transformers/training_args.py:1474: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead\n",
            "  warnings.warn(\n",
            "[2024-06-04 04:30:32,341] [INFO] [axolotl.train.train:119] [PID:3985] [RANK:0] Pre-saving adapter config to ./outputs/out\u001b[39m\n",
            "[2024-06-04 04:30:32,756] [INFO] [axolotl.train.train:156] [PID:3985] [RANK:0] Starting trainer...\u001b[39m\n",
            "[2024-06-04 04:30:33,133] [INFO] [axolotl.utils.samplers.multipack._len_est:185] [PID:3985] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 337140\u001b[39m\n",
            "[2024-06-04 04:30:33,134] [INFO] [axolotl.utils.samplers.multipack._len_est:185] [PID:3985] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 337140\u001b[39m\n",
            "  0% 0/53 [00:00<?, ?it/s][2024-06-04 04:30:33,176] [INFO] [axolotl.utils.samplers.multipack._len_est:185] [PID:3985] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 337140\u001b[39m\n",
            "You're using a GemmaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n",
            "{'loss': 3.7019, 'grad_norm': 0.392578125, 'learning_rate': 4e-05, 'epoch': 0.02}\n",
            "  2% 1/53 [00:29<25:21, 29.27s/it]\n",
            "  0% 0/200 [00:00<?, ?it/s]\u001b[A\n",
            "  1% 2/200 [00:02<04:42,  1.43s/it]\u001b[A\n",
            "  2% 3/200 [00:05<06:40,  2.03s/it]\u001b[A\n",
            "  2% 4/200 [00:08<07:41,  2.36s/it]\u001b[A\n",
            "  2% 5/200 [00:11<08:15,  2.54s/it]\u001b[A\n",
            "  3% 6/200 [00:14<08:35,  2.66s/it]\u001b[A\n",
            "  4% 7/200 [00:17<08:52,  2.76s/it]\u001b[A\n",
            "  4% 8/200 [00:20<09:01,  2.82s/it]\u001b[A\n",
            "  4% 9/200 [00:23<09:07,  2.87s/it]\u001b[A\n",
            "  5% 10/200 [00:26<09:12,  2.91s/it]\u001b[A\n",
            "  6% 11/200 [00:29<09:12,  2.92s/it]\u001b[A\n",
            "  6% 12/200 [00:32<09:11,  2.94s/it]\u001b[A\n",
            "  6% 13/200 [00:35<09:12,  2.95s/it]\u001b[A\n",
            "  7% 14/200 [00:38<09:11,  2.97s/it]\u001b[A\n",
            "  8% 15/200 [00:41<09:12,  2.99s/it]\u001b[A\n",
            "  8% 16/200 [00:44<09:14,  3.01s/it]\u001b[A\n",
            "  8% 17/200 [00:47<09:13,  3.02s/it]\u001b[A\n",
            "  9% 18/200 [00:50<09:12,  3.03s/it]\u001b[A\n",
            " 10% 19/200 [00:53<09:12,  3.05s/it]\u001b[A\n",
            " 10% 20/200 [00:56<09:12,  3.07s/it]\u001b[A\n",
            " 10% 21/200 [00:59<09:09,  3.07s/it]\u001b[A\n",
            " 11% 22/200 [01:02<09:09,  3.09s/it]\u001b[A\n",
            " 12% 23/200 [01:05<09:08,  3.10s/it]\u001b[A\n",
            " 12% 24/200 [01:09<09:04,  3.10s/it]\u001b[A\n",
            " 12% 25/200 [01:12<09:05,  3.12s/it]\u001b[A\n",
            " 13% 26/200 [01:15<09:06,  3.14s/it]\u001b[A\n",
            " 14% 27/200 [01:18<09:07,  3.16s/it]\u001b[A\n",
            " 14% 28/200 [01:21<09:04,  3.17s/it]\u001b[A\n",
            " 14% 29/200 [01:24<08:59,  3.15s/it]\u001b[A\n",
            " 15% 30/200 [01:28<08:55,  3.15s/it]\u001b[A\n",
            " 16% 31/200 [01:31<08:52,  3.15s/it]\u001b[A\n",
            " 16% 32/200 [01:34<08:46,  3.14s/it]\u001b[A\n",
            " 16% 33/200 [01:37<08:42,  3.13s/it]\u001b[A\n",
            " 17% 34/200 [01:40<08:38,  3.12s/it]\u001b[A\n",
            " 18% 35/200 [01:43<08:32,  3.11s/it]\u001b[A\n",
            " 18% 36/200 [01:46<08:27,  3.09s/it]\u001b[A\n",
            " 18% 37/200 [01:49<08:23,  3.09s/it]\u001b[A\n",
            " 19% 38/200 [01:52<08:18,  3.08s/it]\u001b[A\n",
            " 20% 39/200 [01:55<08:14,  3.07s/it]\u001b[A\n",
            " 20% 40/200 [01:58<08:12,  3.08s/it]\u001b[A\n",
            " 20% 41/200 [02:02<08:07,  3.07s/it]\u001b[A\n",
            " 21% 42/200 [02:05<08:04,  3.07s/it]\u001b[A\n",
            " 22% 43/200 [02:08<08:02,  3.07s/it]\u001b[A\n",
            " 22% 44/200 [02:11<07:58,  3.07s/it]\u001b[A\n",
            " 22% 45/200 [02:14<07:54,  3.06s/it]\u001b[A\n",
            " 23% 46/200 [02:17<07:52,  3.07s/it]\u001b[A\n",
            " 24% 47/200 [02:20<07:49,  3.07s/it]\u001b[A\n",
            " 24% 48/200 [02:23<07:47,  3.08s/it]\u001b[A\n",
            " 24% 49/200 [02:26<07:46,  3.09s/it]\u001b[A\n",
            " 25% 50/200 [02:29<07:43,  3.09s/it]\u001b[A\n",
            " 26% 51/200 [02:32<07:39,  3.08s/it]\u001b[A\n",
            " 26% 52/200 [02:35<07:38,  3.10s/it]\u001b[A\n",
            " 26% 53/200 [02:39<07:37,  3.11s/it]\u001b[A\n",
            " 27% 54/200 [02:42<07:33,  3.11s/it]\u001b[A\n",
            " 28% 55/200 [02:45<07:31,  3.11s/it]\u001b[A\n",
            " 28% 56/200 [02:48<07:27,  3.11s/it]\u001b[A\n",
            " 28% 57/200 [02:51<07:25,  3.11s/it]\u001b[A\n",
            " 29% 58/200 [02:54<07:22,  3.11s/it]\u001b[A\n",
            " 30% 59/200 [02:57<07:17,  3.10s/it]\u001b[A\n",
            " 30% 60/200 [03:00<07:13,  3.09s/it]\u001b[A\n",
            " 30% 61/200 [03:03<07:10,  3.10s/it]\u001b[A\n",
            " 31% 62/200 [03:06<07:06,  3.09s/it]\u001b[A\n",
            " 32% 63/200 [03:10<07:02,  3.08s/it]\u001b[A\n",
            " 32% 64/200 [03:13<07:00,  3.09s/it]\u001b[A\n",
            " 32% 65/200 [03:16<06:55,  3.08s/it]\u001b[A\n",
            " 33% 66/200 [03:19<06:51,  3.07s/it]\u001b[A\n",
            " 34% 67/200 [03:22<06:49,  3.08s/it]\u001b[A\n",
            " 34% 68/200 [03:25<06:46,  3.08s/it]\u001b[A\n",
            " 34% 69/200 [03:28<06:44,  3.09s/it]\u001b[A\n",
            " 35% 70/200 [03:31<06:42,  3.10s/it]\u001b[A\n",
            " 36% 71/200 [03:34<06:39,  3.10s/it]\u001b[A\n",
            " 36% 72/200 [03:37<06:36,  3.10s/it]\u001b[A\n",
            " 36% 73/200 [03:40<06:34,  3.11s/it]\u001b[A\n",
            " 37% 74/200 [03:44<06:29,  3.09s/it]\u001b[A\n",
            " 38% 75/200 [03:47<06:25,  3.09s/it]\u001b[A\n",
            " 38% 76/200 [03:50<06:23,  3.09s/it]\u001b[A\n",
            " 38% 77/200 [03:53<06:20,  3.09s/it]\u001b[A\n",
            " 39% 78/200 [03:56<06:16,  3.08s/it]\u001b[A\n",
            " 40% 79/200 [03:59<06:13,  3.09s/it]\u001b[A\n",
            " 40% 80/200 [04:02<06:09,  3.08s/it]\u001b[A\n",
            " 40% 81/200 [04:05<06:06,  3.08s/it]\u001b[A\n",
            " 41% 82/200 [04:08<06:05,  3.09s/it]\u001b[A\n",
            " 42% 83/200 [04:11<06:02,  3.10s/it]\u001b[A\n",
            " 42% 84/200 [04:14<06:00,  3.11s/it]\u001b[A\n",
            " 42% 85/200 [04:18<05:57,  3.11s/it]\u001b[A\n",
            " 43% 86/200 [04:21<05:52,  3.09s/it]\u001b[A\n",
            " 44% 87/200 [04:24<05:49,  3.09s/it]\u001b[A\n",
            " 44% 88/200 [04:27<05:45,  3.09s/it]\u001b[A\n",
            " 44% 89/200 [04:30<05:41,  3.08s/it]\u001b[A\n",
            " 45% 90/200 [04:33<05:39,  3.08s/it]\u001b[A\n",
            " 46% 91/200 [04:36<05:36,  3.09s/it]\u001b[A\n",
            " 46% 92/200 [04:39<05:32,  3.08s/it]\u001b[A\n",
            " 46% 93/200 [04:42<05:29,  3.08s/it]\u001b[A\n",
            " 47% 94/200 [04:45<05:27,  3.09s/it]\u001b[A\n",
            " 48% 95/200 [04:48<05:24,  3.09s/it]\u001b[A\n",
            " 48% 96/200 [04:51<05:21,  3.09s/it]\u001b[A\n",
            " 48% 97/200 [04:55<05:19,  3.10s/it]\u001b[A\n",
            " 49% 98/200 [04:58<05:15,  3.09s/it]\u001b[A\n",
            " 50% 99/200 [05:01<05:12,  3.09s/it]\u001b[A\n",
            " 50% 100/200 [05:04<05:09,  3.10s/it]\u001b[A\n",
            " 50% 101/200 [05:07<05:07,  3.10s/it]\u001b[A\n",
            " 51% 102/200 [05:10<05:04,  3.10s/it]\u001b[A\n",
            " 52% 103/200 [05:13<05:01,  3.10s/it]\u001b[A\n",
            " 52% 104/200 [05:16<04:57,  3.10s/it]\u001b[A\n",
            " 52% 105/200 [05:19<04:52,  3.08s/it]\u001b[A\n",
            " 53% 106/200 [05:22<04:50,  3.09s/it]\u001b[A\n",
            " 54% 107/200 [05:26<04:46,  3.08s/it]\u001b[A\n",
            " 54% 108/200 [05:29<04:43,  3.08s/it]\u001b[A\n",
            " 55% 109/200 [05:32<04:41,  3.10s/it]\u001b[A\n",
            " 55% 110/200 [05:35<04:39,  3.10s/it]\u001b[A\n",
            " 56% 111/200 [05:38<04:35,  3.10s/it]\u001b[A\n",
            " 56% 112/200 [05:41<04:33,  3.11s/it]\u001b[A\n",
            " 56% 113/200 [05:44<04:29,  3.10s/it]\u001b[A\n",
            " 57% 114/200 [05:47<04:26,  3.10s/it]\u001b[A\n",
            " 57% 115/200 [05:50<04:23,  3.10s/it]\u001b[A\n",
            " 58% 116/200 [05:53<04:19,  3.09s/it]\u001b[A\n",
            " 58% 117/200 [05:57<04:16,  3.09s/it]\u001b[A\n",
            " 59% 118/200 [06:00<04:14,  3.10s/it]\u001b[A\n",
            " 60% 119/200 [06:03<04:10,  3.10s/it]\u001b[A\n",
            " 60% 120/200 [06:06<04:07,  3.10s/it]\u001b[A\n",
            " 60% 121/200 [06:09<04:05,  3.10s/it]\u001b[A\n",
            " 61% 122/200 [06:12<04:01,  3.10s/it]\u001b[A\n",
            " 62% 123/200 [06:15<03:58,  3.10s/it]\u001b[A\n",
            " 62% 124/200 [06:18<03:55,  3.10s/it]\u001b[A\n",
            " 62% 125/200 [06:21<03:52,  3.11s/it]\u001b[A\n",
            " 63% 126/200 [06:24<03:49,  3.10s/it]\u001b[A\n",
            " 64% 127/200 [06:28<03:46,  3.10s/it]\u001b[A\n",
            " 64% 128/200 [06:31<03:42,  3.09s/it]\u001b[A\n",
            " 64% 129/200 [06:34<03:39,  3.09s/it]\u001b[A\n",
            " 65% 130/200 [06:37<03:36,  3.10s/it]\u001b[A\n",
            " 66% 131/200 [06:40<03:33,  3.09s/it]\u001b[A\n",
            " 66% 132/200 [06:43<03:29,  3.09s/it]\u001b[A\n",
            " 66% 133/200 [06:46<03:27,  3.10s/it]\u001b[A\n",
            " 67% 134/200 [06:49<03:24,  3.10s/it]\u001b[A\n",
            " 68% 135/200 [06:52<03:20,  3.09s/it]\u001b[A\n",
            " 68% 136/200 [06:55<03:18,  3.10s/it]\u001b[A\n",
            " 68% 137/200 [06:58<03:14,  3.09s/it]\u001b[A\n",
            " 69% 138/200 [07:02<03:12,  3.10s/it]\u001b[A\n",
            " 70% 139/200 [07:05<03:09,  3.11s/it]\u001b[A\n",
            " 70% 140/200 [07:08<03:06,  3.11s/it]\u001b[A\n",
            " 70% 141/200 [07:11<03:03,  3.11s/it]\u001b[A\n",
            " 71% 142/200 [07:14<03:01,  3.12s/it]\u001b[A\n",
            " 72% 143/200 [07:17<02:57,  3.12s/it]\u001b[A\n",
            " 72% 144/200 [07:20<02:53,  3.10s/it]\u001b[A\n",
            " 72% 145/200 [07:23<02:50,  3.11s/it]\u001b[A\n",
            " 73% 146/200 [07:26<02:47,  3.11s/it]\u001b[A\n",
            " 74% 147/200 [07:30<02:44,  3.11s/it]\u001b[A\n",
            " 74% 148/200 [07:33<02:41,  3.11s/it]\u001b[A\n",
            " 74% 149/200 [07:36<02:38,  3.10s/it]\u001b[A\n",
            " 75% 150/200 [07:39<02:35,  3.11s/it]\u001b[A\n",
            " 76% 151/200 [07:42<02:32,  3.11s/it]\u001b[A\n",
            " 76% 152/200 [07:45<02:28,  3.10s/it]\u001b[A\n",
            " 76% 153/200 [07:48<02:25,  3.10s/it]\u001b[A\n",
            " 77% 154/200 [07:51<02:22,  3.10s/it]\u001b[A\n",
            " 78% 155/200 [07:54<02:18,  3.09s/it]\u001b[A\n",
            " 78% 156/200 [07:57<02:15,  3.09s/it]\u001b[A\n",
            " 78% 157/200 [08:01<02:13,  3.09s/it]\u001b[A\n",
            " 79% 158/200 [08:04<02:09,  3.09s/it]\u001b[A\n",
            " 80% 159/200 [08:07<02:06,  3.09s/it]\u001b[A\n",
            " 80% 160/200 [08:10<02:03,  3.09s/it]\u001b[A\n",
            " 80% 161/200 [08:13<02:00,  3.09s/it]\u001b[A\n",
            " 81% 162/200 [08:16<01:57,  3.08s/it]\u001b[A\n",
            " 82% 163/200 [08:19<01:54,  3.09s/it]\u001b[A\n",
            " 82% 164/200 [08:22<01:51,  3.09s/it]\u001b[A\n",
            " 82% 165/200 [08:25<01:48,  3.09s/it]\u001b[A\n",
            " 83% 166/200 [08:28<01:45,  3.09s/it]\u001b[A\n",
            " 84% 167/200 [08:31<01:41,  3.08s/it]\u001b[A\n",
            " 84% 168/200 [08:34<01:38,  3.08s/it]\u001b[A\n",
            " 84% 169/200 [08:38<01:35,  3.09s/it]\u001b[A\n",
            " 85% 170/200 [08:41<01:32,  3.08s/it]\u001b[A\n",
            " 86% 171/200 [08:44<01:29,  3.08s/it]\u001b[A\n",
            " 86% 172/200 [08:47<01:26,  3.09s/it]\u001b[A\n",
            " 86% 173/200 [08:50<01:23,  3.09s/it]\u001b[A\n",
            " 87% 174/200 [08:53<01:20,  3.08s/it]\u001b[A\n",
            " 88% 175/200 [08:56<01:17,  3.09s/it]\u001b[A\n",
            " 88% 176/200 [08:59<01:13,  3.08s/it]\u001b[A\n",
            " 88% 177/200 [09:02<01:10,  3.07s/it]\u001b[A\n",
            " 89% 178/200 [09:05<01:07,  3.07s/it]\u001b[A\n",
            " 90% 179/200 [09:08<01:04,  3.07s/it]\u001b[A\n",
            " 90% 180/200 [09:11<01:01,  3.07s/it]\u001b[A\n",
            " 90% 181/200 [09:14<00:58,  3.08s/it]\u001b[A\n",
            " 91% 182/200 [09:18<00:55,  3.08s/it]\u001b[A\n",
            " 92% 183/200 [09:21<00:52,  3.08s/it]\u001b[A\n",
            " 92% 184/200 [09:24<00:49,  3.09s/it]\u001b[A\n",
            " 92% 185/200 [09:27<00:46,  3.08s/it]\u001b[A\n",
            " 93% 186/200 [09:30<00:43,  3.07s/it]\u001b[A\n",
            " 94% 187/200 [09:33<00:40,  3.08s/it]\u001b[A\n",
            " 94% 188/200 [09:36<00:36,  3.08s/it]\u001b[A\n",
            " 94% 189/200 [09:39<00:33,  3.08s/it]\u001b[A\n",
            " 95% 190/200 [09:42<00:30,  3.09s/it]\u001b[A\n",
            " 96% 191/200 [09:45<00:27,  3.08s/it]\u001b[A\n",
            " 96% 192/200 [09:48<00:24,  3.08s/it]\u001b[A\n",
            " 96% 193/200 [09:51<00:21,  3.09s/it]\u001b[A\n",
            " 97% 194/200 [09:55<00:18,  3.09s/it]\u001b[A\n",
            " 98% 195/200 [09:58<00:15,  3.09s/it]\u001b[A\n",
            " 98% 196/200 [10:01<00:12,  3.10s/it]\u001b[A\n",
            " 98% 197/200 [10:04<00:09,  3.10s/it]\u001b[A\n",
            " 99% 198/200 [10:07<00:06,  3.10s/it]\u001b[A\n",
            "100% 199/200 [10:10<00:03,  3.11s/it]\u001b[A[2024-06-04 04:41:17,366] [INFO] [accelerate.accelerator.gather_for_metrics:2380] [PID:3985] The used dataset had no length, returning gathered tensors. You should drop the remainder yourself.\n",
            "\n",
            "                                  \n",
            "\u001b[A{'eval_loss': 1.4038130044937134, 'eval_runtime': 616.7183, 'eval_samples_per_second': 0.324, 'eval_steps_per_second': 0.324, 'epoch': 0.02}\n",
            "  2% 1/53 [10:45<25:21, 29.27s/it]\n",
            "100% 200/200 [10:15<00:00,  3.11s/it]\u001b[A\n",
            "                                     \u001b[A[2024-06-04 04:41:50,699] [INFO] [axolotl.callbacks.on_step_end:126] [PID:3985] [RANK:0] GPU memory usage while training: 2.940GB (+8.185GB cache, +0.386GB misc)\u001b[39m\n",
            "{'loss': 4.0528, 'grad_norm': 0.4140625, 'learning_rate': 8e-05, 'epoch': 0.04}\n",
            "{'loss': 4.1224, 'grad_norm': 0.400390625, 'learning_rate': 0.00012, 'epoch': 0.06}\n",
            "{'loss': 3.793, 'grad_norm': 0.484375, 'learning_rate': 0.00016, 'epoch': 0.07}\n",
            "{'loss': 4.1102, 'grad_norm': 0.52734375, 'learning_rate': 0.0002, 'epoch': 0.09}\n",
            "{'loss': 3.9811, 'grad_norm': 0.5078125, 'learning_rate': 0.00019978589232386035, 'epoch': 0.11}\n",
            "{'loss': 4.106, 'grad_norm': 0.6015625, 'learning_rate': 0.00019914448613738106, 'epoch': 0.13}\n",
            "{'loss': 3.5104, 'grad_norm': 0.7421875, 'learning_rate': 0.00019807852804032305, 'epoch': 0.15}\n",
            "{'loss': 4.0803, 'grad_norm': 0.8359375, 'learning_rate': 0.00019659258262890683, 'epoch': 0.17}\n",
            "{'loss': 3.7566, 'grad_norm': 0.828125, 'learning_rate': 0.0001946930129495106, 'epoch': 0.19}\n",
            "{'loss': 3.4785, 'grad_norm': 1.03125, 'learning_rate': 0.0001923879532511287, 'epoch': 0.2}\n",
            "{'loss': 3.5915, 'grad_norm': 0.9765625, 'learning_rate': 0.00018968727415326884, 'epoch': 0.22}\n",
            "{'loss': 3.3392, 'grad_norm': 0.88671875, 'learning_rate': 0.00018660254037844388, 'epoch': 0.24}\n",
            "{'loss': 3.504, 'grad_norm': 0.9921875, 'learning_rate': 0.00018314696123025454, 'epoch': 0.26}\n",
            " 26% 14/53 [17:39<22:20, 34.37s/it]\n",
            "  0% 0/200 [00:00<?, ?it/s]\u001b[A\n",
            "  1% 2/200 [00:03<05:07,  1.55s/it]\u001b[A\n",
            "  2% 3/200 [00:06<07:12,  2.19s/it]\u001b[A\n",
            "  2% 4/200 [00:09<08:18,  2.54s/it]\u001b[A\n",
            "  2% 5/200 [00:12<08:51,  2.73s/it]\u001b[A\n",
            "  3% 6/200 [00:15<09:13,  2.86s/it]\u001b[A\n",
            "  4% 7/200 [00:18<09:28,  2.94s/it]\u001b[A\n",
            "  4% 8/200 [00:21<09:34,  2.99s/it]\u001b[A\n",
            "  4% 9/200 [00:24<09:39,  3.03s/it]\u001b[A\n",
            "  5% 10/200 [00:27<09:41,  3.06s/it]\u001b[A\n",
            "  6% 11/200 [00:31<09:39,  3.07s/it]\u001b[A\n",
            "  6% 12/200 [00:34<09:37,  3.07s/it]\u001b[A\n",
            "  6% 13/200 [00:37<09:36,  3.08s/it]\u001b[A\n",
            "  7% 14/200 [00:40<09:34,  3.09s/it]\u001b[A\n",
            "  8% 15/200 [00:43<09:31,  3.09s/it]\u001b[A\n",
            "  8% 16/200 [00:46<09:28,  3.09s/it]\u001b[A\n",
            "  8% 17/200 [00:49<09:26,  3.09s/it]\u001b[A\n",
            "  9% 18/200 [00:52<09:25,  3.11s/it]\u001b[A\n",
            " 10% 19/200 [00:55<09:23,  3.12s/it]\u001b[A\n",
            " 10% 20/200 [00:59<09:22,  3.12s/it]\u001b[A\n",
            " 10% 21/200 [01:02<09:19,  3.13s/it]\u001b[A\n",
            " 11% 22/200 [01:05<09:16,  3.13s/it]\u001b[A\n",
            " 12% 23/200 [01:08<09:12,  3.12s/it]\u001b[A\n",
            " 12% 24/200 [01:11<09:06,  3.11s/it]\u001b[A\n",
            " 12% 25/200 [01:14<09:03,  3.10s/it]\u001b[A\n",
            " 13% 26/200 [01:17<09:00,  3.11s/it]\u001b[A\n",
            " 14% 27/200 [01:20<08:57,  3.11s/it]\u001b[A\n",
            " 14% 28/200 [01:23<08:55,  3.11s/it]\u001b[A\n",
            " 14% 29/200 [01:27<08:49,  3.10s/it]\u001b[A\n",
            " 15% 30/200 [01:30<08:45,  3.09s/it]\u001b[A\n",
            " 16% 31/200 [01:33<08:43,  3.10s/it]\u001b[A\n",
            " 16% 32/200 [01:36<08:41,  3.10s/it]\u001b[A\n",
            " 16% 33/200 [01:39<08:38,  3.10s/it]\u001b[A\n",
            " 17% 34/200 [01:42<08:36,  3.11s/it]\u001b[A\n",
            " 18% 35/200 [01:45<08:31,  3.10s/it]\u001b[A\n",
            " 18% 36/200 [01:48<08:28,  3.10s/it]\u001b[A\n",
            " 18% 37/200 [01:51<08:26,  3.10s/it]\u001b[A\n",
            " 19% 38/200 [01:54<08:21,  3.10s/it]\u001b[A\n",
            " 20% 39/200 [01:58<08:17,  3.09s/it]\u001b[A\n",
            " 20% 40/200 [02:01<08:16,  3.10s/it]\u001b[A\n",
            " 20% 41/200 [02:04<08:12,  3.10s/it]\u001b[A\n",
            " 21% 42/200 [02:07<08:08,  3.09s/it]\u001b[A\n",
            " 22% 43/200 [02:10<08:06,  3.10s/it]\u001b[A\n",
            " 22% 44/200 [02:13<08:02,  3.09s/it]\u001b[A\n",
            " 22% 45/200 [02:16<07:58,  3.09s/it]\u001b[A\n",
            " 23% 46/200 [02:19<07:56,  3.10s/it]\u001b[A\n",
            " 24% 47/200 [02:22<07:52,  3.09s/it]\u001b[A\n",
            " 24% 48/200 [02:25<07:50,  3.10s/it]\u001b[A\n",
            " 24% 49/200 [02:29<07:49,  3.11s/it]\u001b[A\n",
            " 25% 50/200 [02:32<07:45,  3.10s/it]\u001b[A\n",
            " 26% 51/200 [02:35<07:41,  3.09s/it]\u001b[A\n",
            " 26% 52/200 [02:38<07:39,  3.10s/it]\u001b[A\n",
            " 26% 53/200 [02:41<07:36,  3.11s/it]\u001b[A\n",
            " 27% 54/200 [02:44<07:33,  3.10s/it]\u001b[A\n",
            " 28% 55/200 [02:47<07:29,  3.10s/it]\u001b[A\n",
            " 28% 56/200 [02:50<07:25,  3.09s/it]\u001b[A\n",
            " 28% 57/200 [02:53<07:22,  3.09s/it]\u001b[A\n",
            " 29% 58/200 [02:56<07:20,  3.10s/it]\u001b[A\n",
            " 30% 59/200 [02:59<07:14,  3.08s/it]\u001b[A\n",
            " 30% 60/200 [03:03<07:11,  3.08s/it]\u001b[A\n",
            " 30% 61/200 [03:06<07:08,  3.09s/it]\u001b[A\n",
            " 31% 62/200 [03:09<07:04,  3.08s/it]\u001b[A\n",
            " 32% 63/200 [03:12<07:02,  3.08s/it]\u001b[A\n",
            " 32% 64/200 [03:15<07:00,  3.10s/it]\u001b[A\n",
            " 32% 65/200 [03:18<06:56,  3.09s/it]\u001b[A\n",
            " 33% 66/200 [03:21<06:53,  3.08s/it]\u001b[A\n",
            " 34% 67/200 [03:24<06:51,  3.10s/it]\u001b[A\n",
            " 34% 68/200 [03:27<06:49,  3.10s/it]\u001b[A\n",
            " 34% 69/200 [03:30<06:46,  3.10s/it]\u001b[A\n",
            " 35% 70/200 [03:33<06:44,  3.11s/it]\u001b[A\n",
            " 36% 71/200 [03:37<06:41,  3.12s/it]\u001b[A\n",
            " 36% 72/200 [03:40<06:40,  3.13s/it]\u001b[A\n",
            " 36% 73/200 [03:43<06:37,  3.13s/it]\u001b[A\n",
            " 37% 74/200 [03:46<06:33,  3.12s/it]\u001b[A\n",
            " 38% 75/200 [03:49<06:29,  3.12s/it]\u001b[A\n",
            " 38% 76/200 [03:52<06:27,  3.12s/it]\u001b[A\n",
            " 38% 77/200 [03:55<06:23,  3.12s/it]\u001b[A\n",
            " 39% 78/200 [03:58<06:18,  3.10s/it]\u001b[A\n",
            " 40% 79/200 [04:02<06:16,  3.11s/it]\u001b[A\n",
            " 40% 80/200 [04:05<06:12,  3.10s/it]\u001b[A\n",
            " 40% 81/200 [04:08<06:07,  3.09s/it]\u001b[A\n",
            " 41% 82/200 [04:11<06:06,  3.10s/it]\u001b[A\n",
            " 42% 83/200 [04:14<06:03,  3.10s/it]\u001b[A\n",
            " 42% 84/200 [04:17<05:59,  3.10s/it]\u001b[A\n",
            " 42% 85/200 [04:20<05:56,  3.10s/it]\u001b[A\n",
            " 43% 86/200 [04:23<05:52,  3.10s/it]\u001b[A\n",
            " 44% 87/200 [04:26<05:49,  3.10s/it]\u001b[A\n",
            " 44% 88/200 [04:29<05:47,  3.10s/it]\u001b[A\n",
            " 44% 89/200 [04:33<05:43,  3.10s/it]\u001b[A\n",
            " 45% 90/200 [04:36<05:40,  3.10s/it]\u001b[A\n",
            " 46% 91/200 [04:39<05:38,  3.11s/it]\u001b[A\n",
            " 46% 92/200 [04:42<05:34,  3.10s/it]\u001b[A\n",
            " 46% 93/200 [04:45<05:31,  3.10s/it]\u001b[A\n",
            " 47% 94/200 [04:48<05:29,  3.11s/it]\u001b[A\n",
            " 48% 95/200 [04:51<05:26,  3.11s/it]\u001b[A\n",
            " 48% 96/200 [04:54<05:23,  3.11s/it]\u001b[A\n",
            " 48% 97/200 [04:57<05:21,  3.12s/it]\u001b[A\n",
            " 49% 98/200 [05:01<05:18,  3.12s/it]\u001b[A\n",
            " 50% 99/200 [05:04<05:14,  3.11s/it]\u001b[A\n",
            " 50% 100/200 [05:07<05:11,  3.12s/it]\u001b[A\n",
            " 50% 101/200 [05:10<05:09,  3.13s/it]\u001b[A\n",
            " 51% 102/200 [05:13<05:06,  3.13s/it]\u001b[A\n",
            " 52% 103/200 [05:16<05:03,  3.13s/it]\u001b[A\n",
            " 52% 104/200 [05:19<04:59,  3.12s/it]\u001b[A\n",
            " 52% 105/200 [05:22<04:56,  3.12s/it]\u001b[A\n",
            " 53% 106/200 [05:26<04:54,  3.13s/it]\u001b[A\n",
            " 54% 107/200 [05:29<04:50,  3.12s/it]\u001b[A\n",
            " 54% 108/200 [05:32<04:47,  3.12s/it]\u001b[A\n",
            " 55% 109/200 [05:35<04:44,  3.13s/it]\u001b[A\n",
            " 55% 110/200 [05:38<04:41,  3.12s/it]\u001b[A\n",
            " 56% 111/200 [05:41<04:37,  3.12s/it]\u001b[A\n",
            " 56% 112/200 [05:44<04:34,  3.12s/it]\u001b[A\n",
            " 56% 113/200 [05:47<04:31,  3.12s/it]\u001b[A\n",
            " 57% 114/200 [05:51<04:28,  3.12s/it]\u001b[A\n",
            " 57% 115/200 [05:54<04:26,  3.13s/it]\u001b[A\n",
            " 58% 116/200 [05:57<04:21,  3.12s/it]\u001b[A\n",
            " 58% 117/200 [06:00<04:18,  3.11s/it]\u001b[A\n",
            " 59% 118/200 [06:03<04:15,  3.12s/it]\u001b[A\n",
            " 60% 119/200 [06:06<04:12,  3.12s/it]\u001b[A\n",
            " 60% 120/200 [06:09<04:09,  3.11s/it]\u001b[A\n",
            " 60% 121/200 [06:12<04:06,  3.12s/it]\u001b[A\n",
            " 61% 122/200 [06:15<04:02,  3.11s/it]\u001b[A\n",
            " 62% 123/200 [06:19<03:59,  3.11s/it]\u001b[A\n",
            " 62% 124/200 [06:22<03:57,  3.12s/it]\u001b[A\n",
            " 62% 125/200 [06:25<03:53,  3.12s/it]\u001b[A\n",
            " 63% 126/200 [06:28<03:50,  3.11s/it]\u001b[A\n",
            " 64% 127/200 [06:31<03:47,  3.11s/it]\u001b[A\n",
            " 64% 128/200 [06:34<03:43,  3.11s/it]\u001b[A\n",
            " 64% 129/200 [06:37<03:40,  3.11s/it]\u001b[A\n",
            " 65% 130/200 [06:40<03:37,  3.11s/it]\u001b[A\n",
            " 66% 131/200 [06:43<03:34,  3.10s/it]\u001b[A\n",
            " 66% 132/200 [06:47<03:30,  3.10s/it]\u001b[A\n",
            " 66% 133/200 [06:50<03:28,  3.11s/it]\u001b[A\n",
            " 67% 134/200 [06:53<03:25,  3.11s/it]\u001b[A\n",
            " 68% 135/200 [06:56<03:22,  3.11s/it]\u001b[A\n",
            " 68% 136/200 [06:59<03:19,  3.12s/it]\u001b[A\n",
            " 68% 137/200 [07:02<03:15,  3.11s/it]\u001b[A\n",
            " 69% 138/200 [07:05<03:12,  3.10s/it]\u001b[A\n",
            " 70% 139/200 [07:08<03:09,  3.11s/it]\u001b[A\n",
            " 70% 140/200 [07:11<03:06,  3.10s/it]\u001b[A\n",
            " 70% 141/200 [07:15<03:03,  3.11s/it]\u001b[A\n",
            " 71% 142/200 [07:18<03:01,  3.13s/it]\u001b[A\n",
            " 72% 143/200 [07:21<02:57,  3.12s/it]\u001b[A\n",
            " 72% 144/200 [07:24<02:54,  3.12s/it]\u001b[A\n",
            " 72% 145/200 [07:27<02:51,  3.12s/it]\u001b[A\n",
            " 73% 146/200 [07:30<02:48,  3.13s/it]\u001b[A\n",
            " 74% 147/200 [07:33<02:45,  3.13s/it]\u001b[A\n",
            " 74% 148/200 [07:36<02:42,  3.13s/it]\u001b[A\n",
            " 74% 149/200 [07:40<02:39,  3.13s/it]\u001b[A\n",
            " 75% 150/200 [07:43<02:36,  3.13s/it]\u001b[A\n",
            " 76% 151/200 [07:46<02:34,  3.14s/it]\u001b[A\n",
            " 76% 152/200 [07:49<02:30,  3.13s/it]\u001b[A\n",
            " 76% 153/200 [07:52<02:26,  3.13s/it]\u001b[A\n",
            " 77% 154/200 [07:55<02:23,  3.12s/it]\u001b[A\n",
            " 78% 155/200 [07:58<02:19,  3.11s/it]\u001b[A\n",
            " 78% 156/200 [08:01<02:16,  3.10s/it]\u001b[A\n",
            " 78% 157/200 [08:05<02:13,  3.11s/it]\u001b[A\n",
            " 79% 158/200 [08:08<02:10,  3.10s/it]\u001b[A\n",
            " 80% 159/200 [08:11<02:07,  3.10s/it]\u001b[A\n",
            " 80% 160/200 [08:14<02:04,  3.11s/it]\u001b[A\n",
            " 80% 161/200 [08:17<02:00,  3.10s/it]\u001b[A\n",
            " 81% 162/200 [08:20<01:57,  3.10s/it]\u001b[A\n",
            " 82% 163/200 [08:23<01:54,  3.11s/it]\u001b[A\n",
            " 82% 164/200 [08:26<01:51,  3.11s/it]\u001b[A\n",
            " 82% 165/200 [08:29<01:48,  3.11s/it]\u001b[A\n",
            " 83% 166/200 [08:32<01:45,  3.12s/it]\u001b[A\n",
            " 84% 167/200 [08:36<01:42,  3.11s/it]\u001b[A\n",
            " 84% 168/200 [08:39<01:39,  3.11s/it]\u001b[A\n",
            " 84% 169/200 [08:42<01:36,  3.12s/it]\u001b[A\n",
            " 85% 170/200 [08:45<01:33,  3.11s/it]\u001b[A\n",
            " 86% 171/200 [08:48<01:30,  3.11s/it]\u001b[A\n",
            " 86% 172/200 [08:51<01:27,  3.12s/it]\u001b[A\n",
            " 86% 173/200 [08:54<01:24,  3.12s/it]\u001b[A\n",
            " 87% 174/200 [08:57<01:20,  3.11s/it]\u001b[A\n",
            " 88% 175/200 [09:00<01:17,  3.11s/it]\u001b[A\n",
            " 88% 176/200 [09:04<01:14,  3.10s/it]\u001b[A\n",
            " 88% 177/200 [09:07<01:11,  3.11s/it]\u001b[A\n",
            " 89% 178/200 [09:10<01:08,  3.11s/it]\u001b[A\n",
            " 90% 179/200 [09:13<01:05,  3.10s/it]\u001b[A\n",
            " 90% 180/200 [09:16<01:02,  3.10s/it]\u001b[A\n",
            " 90% 181/200 [09:19<00:59,  3.12s/it]\u001b[A\n",
            " 91% 182/200 [09:22<00:55,  3.11s/it]\u001b[A\n",
            " 92% 183/200 [09:25<00:52,  3.11s/it]\u001b[A\n",
            " 92% 184/200 [09:28<00:49,  3.11s/it]\u001b[A\n",
            " 92% 185/200 [09:32<00:46,  3.11s/it]\u001b[A\n",
            " 93% 186/200 [09:35<00:43,  3.10s/it]\u001b[A\n",
            " 94% 187/200 [09:38<00:40,  3.11s/it]\u001b[A\n",
            " 94% 188/200 [09:41<00:37,  3.11s/it]\u001b[A\n",
            " 94% 189/200 [09:44<00:34,  3.10s/it]\u001b[A\n",
            " 95% 190/200 [09:47<00:31,  3.11s/it]\u001b[A\n",
            " 96% 191/200 [09:50<00:27,  3.10s/it]\u001b[A\n",
            " 96% 192/200 [09:53<00:24,  3.09s/it]\u001b[A\n",
            " 96% 193/200 [09:56<00:21,  3.10s/it]\u001b[A\n",
            " 97% 194/200 [09:59<00:18,  3.10s/it]\u001b[A\n",
            " 98% 195/200 [10:03<00:15,  3.10s/it]\u001b[A\n",
            " 98% 196/200 [10:06<00:12,  3.11s/it]\u001b[A\n",
            " 98% 197/200 [10:09<00:09,  3.12s/it]\u001b[A\n",
            " 99% 198/200 [10:12<00:06,  3.12s/it]\u001b[A\n",
            "100% 199/200 [10:15<00:03,  3.13s/it]\u001b[A[2024-06-04 04:58:32,542] [INFO] [accelerate.accelerator.gather_for_metrics:2380] [PID:3985] The used dataset had no length, returning gathered tensors. You should drop the remainder yourself.\n",
            "\n",
            "                                   \n",
            "\u001b[A{'eval_loss': 1.3955519199371338, 'eval_runtime': 621.7745, 'eval_samples_per_second': 0.322, 'eval_steps_per_second': 0.322, 'epoch': 0.26}\n",
            " 26% 14/53 [28:01<22:20, 34.37s/it]\n",
            "100% 200/200 [10:20<00:00,  3.12s/it]\u001b[A\n",
            "{'loss': 3.6185, 'grad_norm': 1.0234375, 'learning_rate': 0.00017933533402912354, 'epoch': 0.28}\n",
            "{'loss': 3.2849, 'grad_norm': 1.203125, 'learning_rate': 0.00017518398074789775, 'epoch': 0.3}\n",
            "{'loss': 3.1333, 'grad_norm': 1.0, 'learning_rate': 0.00017071067811865476, 'epoch': 0.32}\n",
            "{'loss': 3.0694, 'grad_norm': 1.0546875, 'learning_rate': 0.00016593458151000688, 'epoch': 0.34}\n",
            "{'loss': 3.0776, 'grad_norm': 1.0, 'learning_rate': 0.00016087614290087208, 'epoch': 0.35}\n",
            "{'loss': 2.9014, 'grad_norm': 1.109375, 'learning_rate': 0.00015555702330196023, 'epoch': 0.37}\n",
            "{'loss': 2.8639, 'grad_norm': 1.1640625, 'learning_rate': 0.00015000000000000001, 'epoch': 0.39}\n",
            "{'loss': 2.8936, 'grad_norm': 1.15625, 'learning_rate': 0.00014422886902190014, 'epoch': 0.41}\n",
            "{'loss': 2.5632, 'grad_norm': 1.2578125, 'learning_rate': 0.000138268343236509, 'epoch': 0.43}\n",
            "{'loss': 2.7037, 'grad_norm': 1.2578125, 'learning_rate': 0.00013214394653031616, 'epoch': 0.45}\n",
            "{'loss': 2.6, 'grad_norm': 1.1484375, 'learning_rate': 0.00012588190451025207, 'epoch': 0.47}\n",
            "{'loss': 2.4671, 'grad_norm': 1.1796875, 'learning_rate': 0.00011950903220161285, 'epoch': 0.48}\n",
            "{'loss': 2.5941, 'grad_norm': 1.28125, 'learning_rate': 0.00011305261922200519, 'epoch': 0.5}\n",
            "{'loss': 2.2938, 'grad_norm': 1.3984375, 'learning_rate': 0.00010654031292301432, 'epoch': 0.52}\n",
            " 53% 28/53 [35:25<14:01, 33.66s/it]\n",
            "  0% 0/200 [00:00<?, ?it/s]\u001b[A\n",
            "  1% 2/200 [00:03<05:08,  1.56s/it]\u001b[A\n",
            "  2% 3/200 [00:06<07:14,  2.21s/it]\u001b[A\n",
            "  2% 4/200 [00:09<08:20,  2.55s/it]\u001b[A\n",
            "  2% 5/200 [00:12<08:55,  2.75s/it]\u001b[A\n",
            "  3% 6/200 [00:15<09:15,  2.87s/it]\u001b[A\n",
            "  4% 7/200 [00:18<09:31,  2.96s/it]\u001b[A\n",
            "  4% 8/200 [00:21<09:39,  3.02s/it]\u001b[A\n",
            "  4% 9/200 [00:25<09:43,  3.05s/it]\u001b[A\n",
            "  5% 10/200 [00:28<09:45,  3.08s/it]\u001b[A\n",
            "  6% 11/200 [00:31<09:41,  3.08s/it]\u001b[A\n",
            "  6% 12/200 [00:34<09:38,  3.08s/it]\u001b[A\n",
            "  6% 13/200 [00:37<09:38,  3.09s/it]\u001b[A\n",
            "  7% 14/200 [00:40<09:35,  3.09s/it]\u001b[A\n",
            "  8% 15/200 [00:43<09:32,  3.10s/it]\u001b[A\n",
            "  8% 16/200 [00:46<09:30,  3.10s/it]\u001b[A\n",
            "  8% 17/200 [00:49<09:28,  3.11s/it]\u001b[A\n",
            "  9% 18/200 [00:52<09:25,  3.11s/it]\u001b[A\n",
            " 10% 19/200 [00:56<09:24,  3.12s/it]\u001b[A\n",
            " 10% 20/200 [00:59<09:22,  3.12s/it]\u001b[A\n",
            " 10% 21/200 [01:02<09:18,  3.12s/it]\u001b[A\n",
            " 11% 22/200 [01:05<09:16,  3.12s/it]\u001b[A\n",
            " 12% 23/200 [01:08<09:12,  3.12s/it]\u001b[A\n",
            " 12% 24/200 [01:11<09:09,  3.12s/it]\u001b[A\n",
            " 12% 25/200 [01:14<09:06,  3.13s/it]\u001b[A\n",
            " 13% 26/200 [01:18<09:05,  3.14s/it]\u001b[A\n",
            " 14% 27/200 [01:21<09:03,  3.14s/it]\u001b[A\n",
            " 14% 28/200 [01:24<09:01,  3.15s/it]\u001b[A\n",
            " 14% 29/200 [01:27<08:55,  3.13s/it]\u001b[A\n",
            " 15% 30/200 [01:30<08:51,  3.13s/it]\u001b[A\n",
            " 16% 31/200 [01:33<08:49,  3.13s/it]\u001b[A\n",
            " 16% 32/200 [01:36<08:44,  3.12s/it]\u001b[A\n",
            " 16% 33/200 [01:39<08:40,  3.12s/it]\u001b[A\n",
            " 17% 34/200 [01:43<08:38,  3.12s/it]\u001b[A\n",
            " 18% 35/200 [01:46<08:34,  3.12s/it]\u001b[A\n",
            " 18% 36/200 [01:49<08:30,  3.11s/it]\u001b[A\n",
            " 18% 37/200 [01:52<08:26,  3.11s/it]\u001b[A\n",
            " 19% 38/200 [01:55<08:21,  3.10s/it]\u001b[A\n",
            " 20% 39/200 [01:58<08:19,  3.10s/it]\u001b[A\n",
            " 20% 40/200 [02:01<08:17,  3.11s/it]\u001b[A\n",
            " 20% 41/200 [02:04<08:13,  3.10s/it]\u001b[A\n",
            " 21% 42/200 [02:07<08:09,  3.10s/it]\u001b[A\n",
            " 22% 43/200 [02:10<08:06,  3.10s/it]\u001b[A\n",
            " 22% 44/200 [02:14<08:02,  3.09s/it]\u001b[A\n",
            " 22% 45/200 [02:17<07:59,  3.09s/it]\u001b[A\n",
            " 23% 46/200 [02:20<07:56,  3.10s/it]\u001b[A\n",
            " 24% 47/200 [02:23<07:53,  3.10s/it]\u001b[A\n",
            " 24% 48/200 [02:26<07:50,  3.10s/it]\u001b[A\n",
            " 24% 49/200 [02:29<07:49,  3.11s/it]\u001b[A\n",
            " 25% 50/200 [02:32<07:44,  3.10s/it]\u001b[A\n",
            " 26% 51/200 [02:35<07:40,  3.09s/it]\u001b[A\n",
            " 26% 52/200 [02:38<07:39,  3.10s/it]\u001b[A\n",
            " 26% 53/200 [02:41<07:38,  3.12s/it]\u001b[A\n",
            " 27% 54/200 [02:45<07:34,  3.11s/it]\u001b[A\n",
            " 28% 55/200 [02:48<07:31,  3.11s/it]\u001b[A\n",
            " 28% 56/200 [02:51<07:28,  3.11s/it]\u001b[A\n",
            " 28% 57/200 [02:54<07:26,  3.12s/it]\u001b[A\n",
            " 29% 58/200 [02:57<07:23,  3.12s/it]\u001b[A\n",
            " 30% 59/200 [03:00<07:18,  3.11s/it]\u001b[A\n",
            " 30% 60/200 [03:03<07:14,  3.10s/it]\u001b[A\n",
            " 30% 61/200 [03:06<07:11,  3.10s/it]\u001b[A\n",
            " 31% 62/200 [03:09<07:07,  3.10s/it]\u001b[A\n",
            " 32% 63/200 [03:13<07:04,  3.10s/it]\u001b[A\n",
            " 32% 64/200 [03:16<07:02,  3.11s/it]\u001b[A\n",
            " 32% 65/200 [03:19<06:58,  3.10s/it]\u001b[A\n",
            " 33% 66/200 [03:22<06:55,  3.10s/it]\u001b[A\n",
            " 34% 67/200 [03:25<06:53,  3.11s/it]\u001b[A\n",
            " 34% 68/200 [03:28<06:50,  3.11s/it]\u001b[A\n",
            " 34% 69/200 [03:31<06:46,  3.11s/it]\u001b[A\n",
            " 35% 70/200 [03:34<06:44,  3.11s/it]\u001b[A\n",
            " 36% 71/200 [03:37<06:41,  3.11s/it]\u001b[A\n",
            " 36% 72/200 [03:41<06:39,  3.12s/it]\u001b[A\n",
            " 36% 73/200 [03:44<06:36,  3.12s/it]\u001b[A\n",
            " 37% 74/200 [03:47<06:32,  3.11s/it]\u001b[A\n",
            " 38% 75/200 [03:50<06:28,  3.10s/it]\u001b[A\n",
            " 38% 76/200 [03:53<06:25,  3.11s/it]\u001b[A\n",
            " 38% 77/200 [03:56<06:22,  3.11s/it]\u001b[A\n",
            " 39% 78/200 [03:59<06:19,  3.11s/it]\u001b[A\n",
            " 40% 79/200 [04:02<06:16,  3.11s/it]\u001b[A\n",
            " 40% 80/200 [04:05<06:12,  3.11s/it]\u001b[A\n",
            " 40% 81/200 [04:08<06:09,  3.10s/it]\u001b[A\n",
            " 41% 82/200 [04:12<06:06,  3.11s/it]\u001b[A\n",
            " 42% 83/200 [04:15<06:03,  3.10s/it]\u001b[A\n",
            " 42% 84/200 [04:18<05:59,  3.10s/it]\u001b[A\n",
            " 42% 85/200 [04:21<05:56,  3.10s/it]\u001b[A\n",
            " 43% 86/200 [04:24<05:52,  3.10s/it]\u001b[A\n",
            " 44% 87/200 [04:27<05:49,  3.09s/it]\u001b[A\n",
            " 44% 88/200 [04:30<05:46,  3.09s/it]\u001b[A\n",
            " 44% 89/200 [04:33<05:42,  3.08s/it]\u001b[A\n",
            " 45% 90/200 [04:36<05:38,  3.08s/it]\u001b[A\n",
            " 46% 91/200 [04:39<05:36,  3.09s/it]\u001b[A\n",
            " 46% 92/200 [04:42<05:33,  3.09s/it]\u001b[A\n",
            " 46% 93/200 [04:46<05:30,  3.09s/it]\u001b[A\n",
            " 47% 94/200 [04:49<05:28,  3.10s/it]\u001b[A\n",
            " 48% 95/200 [04:52<05:25,  3.10s/it]\u001b[A\n",
            " 48% 96/200 [04:55<05:23,  3.11s/it]\u001b[A\n",
            " 48% 97/200 [04:58<05:21,  3.12s/it]\u001b[A\n",
            " 49% 98/200 [05:01<05:18,  3.12s/it]\u001b[A\n",
            " 50% 99/200 [05:04<05:14,  3.11s/it]\u001b[A\n",
            " 50% 100/200 [05:07<05:12,  3.12s/it]\u001b[A\n",
            " 50% 101/200 [05:11<05:09,  3.13s/it]\u001b[A\n",
            " 51% 102/200 [05:14<05:07,  3.13s/it]\u001b[A\n",
            " 52% 103/200 [05:17<05:04,  3.14s/it]\u001b[A\n",
            " 52% 104/200 [05:20<05:00,  3.13s/it]\u001b[A\n",
            " 52% 105/200 [05:23<04:57,  3.13s/it]\u001b[A\n",
            " 53% 106/200 [05:26<04:55,  3.14s/it]\u001b[A\n",
            " 54% 107/200 [05:29<04:51,  3.13s/it]\u001b[A\n",
            " 54% 108/200 [05:33<04:48,  3.13s/it]\u001b[A\n",
            " 55% 109/200 [05:36<04:44,  3.13s/it]\u001b[A\n",
            " 55% 110/200 [05:39<04:40,  3.11s/it]\u001b[A\n",
            " 56% 111/200 [05:42<04:36,  3.11s/it]\u001b[A\n",
            " 56% 112/200 [05:45<04:34,  3.12s/it]\u001b[A\n",
            " 56% 113/200 [05:48<04:30,  3.11s/it]\u001b[A\n",
            " 57% 114/200 [05:51<04:27,  3.12s/it]\u001b[A\n",
            " 57% 115/200 [05:54<04:25,  3.12s/it]\u001b[A\n",
            " 58% 116/200 [05:57<04:21,  3.11s/it]\u001b[A\n",
            " 58% 117/200 [06:00<04:17,  3.10s/it]\u001b[A\n",
            " 59% 118/200 [06:04<04:15,  3.11s/it]\u001b[A\n",
            " 60% 119/200 [06:07<04:11,  3.11s/it]\u001b[A\n",
            " 60% 120/200 [06:10<04:07,  3.10s/it]\u001b[A\n",
            " 60% 121/200 [06:13<04:05,  3.10s/it]\u001b[A\n",
            " 61% 122/200 [06:16<04:02,  3.10s/it]\u001b[A\n",
            " 62% 123/200 [06:19<03:58,  3.10s/it]\u001b[A\n",
            " 62% 124/200 [06:22<03:56,  3.11s/it]\u001b[A\n",
            " 62% 125/200 [06:25<03:53,  3.11s/it]\u001b[A\n",
            " 63% 126/200 [06:28<03:50,  3.12s/it]\u001b[A\n",
            " 64% 127/200 [06:32<03:47,  3.12s/it]\u001b[A\n",
            " 64% 128/200 [06:35<03:42,  3.10s/it]\u001b[A\n",
            " 64% 129/200 [06:38<03:39,  3.09s/it]\u001b[A\n",
            " 65% 130/200 [06:41<03:37,  3.10s/it]\u001b[A\n",
            " 66% 131/200 [06:44<03:33,  3.10s/it]\u001b[A\n",
            " 66% 132/200 [06:47<03:30,  3.10s/it]\u001b[A\n",
            " 66% 133/200 [06:50<03:28,  3.11s/it]\u001b[A\n",
            " 67% 134/200 [06:53<03:24,  3.11s/it]\u001b[A\n",
            " 68% 135/200 [06:56<03:21,  3.11s/it]\u001b[A\n",
            " 68% 136/200 [06:59<03:18,  3.11s/it]\u001b[A\n",
            " 68% 137/200 [07:03<03:15,  3.10s/it]\u001b[A\n",
            " 69% 138/200 [07:06<03:12,  3.10s/it]\u001b[A\n",
            " 70% 139/200 [07:09<03:09,  3.10s/it]\u001b[A\n",
            " 70% 140/200 [07:12<03:05,  3.09s/it]\u001b[A\n",
            " 70% 141/200 [07:15<03:02,  3.09s/it]\u001b[A\n",
            " 71% 142/200 [07:18<03:00,  3.12s/it]\u001b[A\n",
            " 72% 143/200 [07:21<02:57,  3.11s/it]\u001b[A\n",
            " 72% 144/200 [07:24<02:53,  3.10s/it]\u001b[A\n",
            " 72% 145/200 [07:27<02:51,  3.11s/it]\u001b[A\n",
            " 73% 146/200 [07:31<02:47,  3.11s/it]\u001b[A\n",
            " 74% 147/200 [07:34<02:45,  3.12s/it]\u001b[A\n",
            " 74% 148/200 [07:37<02:42,  3.12s/it]\u001b[A\n",
            " 74% 149/200 [07:40<02:39,  3.12s/it]\u001b[A\n",
            " 75% 150/200 [07:43<02:35,  3.12s/it]\u001b[A\n",
            " 76% 151/200 [07:46<02:32,  3.12s/it]\u001b[A\n",
            " 76% 152/200 [07:49<02:29,  3.11s/it]\u001b[A\n",
            " 76% 153/200 [07:52<02:25,  3.10s/it]\u001b[A\n",
            " 77% 154/200 [07:55<02:22,  3.10s/it]\u001b[A\n",
            " 78% 155/200 [07:59<02:19,  3.10s/it]\u001b[A\n",
            " 78% 156/200 [08:02<02:16,  3.10s/it]\u001b[A\n",
            " 78% 157/200 [08:05<02:13,  3.10s/it]\u001b[A\n",
            " 79% 158/200 [08:08<02:09,  3.09s/it]\u001b[A\n",
            " 80% 159/200 [08:11<02:06,  3.09s/it]\u001b[A\n",
            " 80% 160/200 [08:14<02:03,  3.10s/it]\u001b[A\n",
            " 80% 161/200 [08:17<02:00,  3.09s/it]\u001b[A\n",
            " 81% 162/200 [08:20<01:57,  3.08s/it]\u001b[A\n",
            " 82% 163/200 [08:23<01:54,  3.09s/it]\u001b[A\n",
            " 82% 164/200 [08:26<01:51,  3.09s/it]\u001b[A\n",
            " 82% 165/200 [08:29<01:48,  3.10s/it]\u001b[A\n",
            " 83% 166/200 [08:33<01:45,  3.09s/it]\u001b[A\n",
            " 84% 167/200 [08:36<01:41,  3.08s/it]\u001b[A\n",
            " 84% 168/200 [08:39<01:38,  3.09s/it]\u001b[A\n",
            " 84% 169/200 [08:42<01:36,  3.10s/it]\u001b[A\n",
            " 85% 170/200 [08:45<01:33,  3.10s/it]\u001b[A\n",
            " 86% 171/200 [08:48<01:29,  3.10s/it]\u001b[A\n",
            " 86% 172/200 [08:51<01:27,  3.11s/it]\u001b[A\n",
            " 86% 173/200 [08:54<01:23,  3.10s/it]\u001b[A\n",
            " 87% 174/200 [08:57<01:20,  3.09s/it]\u001b[A\n",
            " 88% 175/200 [09:00<01:17,  3.09s/it]\u001b[A\n",
            " 88% 176/200 [09:03<01:14,  3.09s/it]\u001b[A\n",
            " 88% 177/200 [09:07<01:11,  3.09s/it]\u001b[A\n",
            " 89% 178/200 [09:10<01:08,  3.10s/it]\u001b[A\n",
            " 90% 179/200 [09:13<01:04,  3.09s/it]\u001b[A\n",
            " 90% 180/200 [09:16<01:01,  3.09s/it]\u001b[A\n",
            " 90% 181/200 [09:19<00:58,  3.10s/it]\u001b[A\n",
            " 91% 182/200 [09:22<00:55,  3.09s/it]\u001b[A\n",
            " 92% 183/200 [09:25<00:52,  3.09s/it]\u001b[A\n",
            " 92% 184/200 [09:28<00:49,  3.10s/it]\u001b[A\n",
            " 92% 185/200 [09:31<00:46,  3.09s/it]\u001b[A\n",
            " 93% 186/200 [09:34<00:43,  3.08s/it]\u001b[A\n",
            " 94% 187/200 [09:37<00:40,  3.09s/it]\u001b[A\n",
            " 94% 188/200 [09:41<00:37,  3.09s/it]\u001b[A\n",
            " 94% 189/200 [09:44<00:33,  3.09s/it]\u001b[A\n",
            " 95% 190/200 [09:47<00:30,  3.09s/it]\u001b[A\n",
            " 96% 191/200 [09:50<00:27,  3.09s/it]\u001b[A\n",
            " 96% 192/200 [09:53<00:24,  3.08s/it]\u001b[A\n",
            " 96% 193/200 [09:56<00:21,  3.09s/it]\u001b[A\n",
            " 97% 194/200 [09:59<00:18,  3.09s/it]\u001b[A\n",
            " 98% 195/200 [10:02<00:15,  3.10s/it]\u001b[A\n",
            " 98% 196/200 [10:05<00:12,  3.12s/it]\u001b[A\n",
            " 98% 197/200 [10:09<00:09,  3.13s/it]\u001b[A\n",
            " 99% 198/200 [10:12<00:06,  3.13s/it]\u001b[A\n",
            "100% 199/200 [10:15<00:03,  3.14s/it]\u001b[A[2024-06-04 05:16:18,753] [INFO] [accelerate.accelerator.gather_for_metrics:2380] [PID:3985] The used dataset had no length, returning gathered tensors. You should drop the remainder yourself.\n",
            "\n",
            "                                   \n",
            "\u001b[A{'eval_loss': 1.4225280284881592, 'eval_runtime': 621.6046, 'eval_samples_per_second': 0.322, 'eval_steps_per_second': 0.322, 'epoch': 0.52}\n",
            " 53% 28/53 [45:47<14:01, 33.66s/it]\n",
            "100% 200/200 [10:20<00:00,  3.14s/it]\u001b[A\n",
            "{'loss': 2.4271, 'grad_norm': 1.03125, 'learning_rate': 0.0001, 'epoch': 0.54}\n",
            "{'loss': 2.3753, 'grad_norm': 1.09375, 'learning_rate': 9.345968707698569e-05, 'epoch': 0.56}\n",
            "{'loss': 2.5907, 'grad_norm': 1.2578125, 'learning_rate': 8.694738077799488e-05, 'epoch': 0.58}\n",
            "{'loss': 2.5783, 'grad_norm': 1.2890625, 'learning_rate': 8.049096779838719e-05, 'epoch': 0.6}\n",
            "{'loss': 2.0342, 'grad_norm': 1.125, 'learning_rate': 7.411809548974792e-05, 'epoch': 0.61}\n",
            "{'loss': 2.1088, 'grad_norm': 1.2265625, 'learning_rate': 6.785605346968386e-05, 'epoch': 0.63}\n",
            "{'loss': 2.3938, 'grad_norm': 1.1953125, 'learning_rate': 6.173165676349103e-05, 'epoch': 0.65}\n",
            "{'loss': 2.4193, 'grad_norm': 1.25, 'learning_rate': 5.577113097809989e-05, 'epoch': 0.67}\n",
            "{'loss': 2.1314, 'grad_norm': 1.1328125, 'learning_rate': 5.000000000000002e-05, 'epoch': 0.69}\n",
            "{'loss': 2.2812, 'grad_norm': 1.140625, 'learning_rate': 4.444297669803981e-05, 'epoch': 0.71}\n",
            "{'loss': 2.3065, 'grad_norm': 1.140625, 'learning_rate': 3.9123857099127936e-05, 'epoch': 0.73}\n",
            "{'loss': 2.0951, 'grad_norm': 1.1796875, 'learning_rate': 3.406541848999312e-05, 'epoch': 0.75}\n",
            "{'loss': 2.1334, 'grad_norm': 1.0234375, 'learning_rate': 2.9289321881345254e-05, 'epoch': 0.76}\n",
            "{'loss': 2.0984, 'grad_norm': 1.25, 'learning_rate': 2.4816019252102273e-05, 'epoch': 0.78}\n",
            " 79% 42/53 [53:11<06:09, 33.59s/it]\n",
            "  0% 0/200 [00:00<?, ?it/s]\u001b[A\n",
            "  1% 2/200 [00:03<05:03,  1.53s/it]\u001b[A\n",
            "  2% 3/200 [00:06<07:09,  2.18s/it]\u001b[A\n",
            "  2% 4/200 [00:09<08:14,  2.52s/it]\u001b[A\n",
            "  2% 5/200 [00:12<08:48,  2.71s/it]\u001b[A\n",
            "  3% 6/200 [00:15<09:09,  2.83s/it]\u001b[A\n",
            "  4% 7/200 [00:18<09:24,  2.92s/it]\u001b[A\n",
            "  4% 8/200 [00:21<09:32,  2.98s/it]\u001b[A\n",
            "  4% 9/200 [00:24<09:37,  3.02s/it]\u001b[A\n",
            "  5% 10/200 [00:27<09:40,  3.05s/it]\u001b[A\n",
            "  6% 11/200 [00:30<09:38,  3.06s/it]\u001b[A\n",
            "  6% 12/200 [00:34<09:36,  3.07s/it]\u001b[A\n",
            "  6% 13/200 [00:37<09:37,  3.09s/it]\u001b[A\n",
            "  7% 14/200 [00:40<09:35,  3.09s/it]\u001b[A\n",
            "  8% 15/200 [00:43<09:32,  3.10s/it]\u001b[A\n",
            "  8% 16/200 [00:46<09:31,  3.10s/it]\u001b[A\n",
            "  8% 17/200 [00:49<09:28,  3.10s/it]\u001b[A\n",
            "  9% 18/200 [00:52<09:26,  3.11s/it]\u001b[A\n",
            " 10% 19/200 [00:55<09:23,  3.11s/it]\u001b[A\n",
            " 10% 20/200 [00:58<09:21,  3.12s/it]\u001b[A\n",
            " 10% 21/200 [01:02<09:17,  3.12s/it]\u001b[A\n",
            " 11% 22/200 [01:05<09:15,  3.12s/it]\u001b[A\n",
            " 12% 23/200 [01:08<09:10,  3.11s/it]\u001b[A\n",
            " 12% 24/200 [01:11<09:05,  3.10s/it]\u001b[A\n",
            " 12% 25/200 [01:14<09:02,  3.10s/it]\u001b[A\n",
            " 13% 26/200 [01:17<08:59,  3.10s/it]\u001b[A\n",
            " 14% 27/200 [01:20<08:58,  3.11s/it]\u001b[A\n",
            " 14% 28/200 [01:23<08:55,  3.11s/it]\u001b[A\n",
            " 14% 29/200 [01:26<08:50,  3.10s/it]\u001b[A\n",
            " 15% 30/200 [01:30<08:47,  3.10s/it]\u001b[A\n",
            " 16% 31/200 [01:33<08:44,  3.10s/it]\u001b[A\n",
            " 16% 32/200 [01:36<08:41,  3.10s/it]\u001b[A\n",
            " 16% 33/200 [01:39<08:37,  3.10s/it]\u001b[A\n",
            " 17% 34/200 [01:42<08:34,  3.10s/it]\u001b[A\n",
            " 18% 35/200 [01:45<08:31,  3.10s/it]\u001b[A\n",
            " 18% 36/200 [01:48<08:28,  3.10s/it]\u001b[A\n",
            " 18% 37/200 [01:51<08:25,  3.10s/it]\u001b[A\n",
            " 19% 38/200 [01:54<08:22,  3.10s/it]\u001b[A\n",
            " 20% 39/200 [01:57<08:18,  3.10s/it]\u001b[A\n",
            " 20% 40/200 [02:01<08:16,  3.10s/it]\u001b[A\n",
            " 20% 41/200 [02:04<08:12,  3.10s/it]\u001b[A\n",
            " 21% 42/200 [02:07<08:07,  3.08s/it]\u001b[A\n",
            " 22% 43/200 [02:10<08:05,  3.09s/it]\u001b[A\n",
            " 22% 44/200 [02:13<08:00,  3.08s/it]\u001b[A\n",
            " 22% 45/200 [02:16<07:56,  3.08s/it]\u001b[A\n",
            " 23% 46/200 [02:19<07:54,  3.08s/it]\u001b[A\n",
            " 24% 47/200 [02:22<07:51,  3.08s/it]\u001b[A\n",
            " 24% 48/200 [02:25<07:49,  3.09s/it]\u001b[A\n",
            " 24% 49/200 [02:28<07:47,  3.09s/it]\u001b[A\n",
            " 25% 50/200 [02:31<07:43,  3.09s/it]\u001b[A\n",
            " 26% 51/200 [02:34<07:39,  3.08s/it]\u001b[A\n",
            " 26% 52/200 [02:38<07:37,  3.09s/it]\u001b[A\n",
            " 26% 53/200 [02:41<07:34,  3.09s/it]\u001b[A\n",
            " 27% 54/200 [02:44<07:31,  3.10s/it]\u001b[A\n",
            " 28% 55/200 [02:47<07:29,  3.10s/it]\u001b[A\n",
            " 28% 56/200 [02:50<07:25,  3.10s/it]\u001b[A\n",
            " 28% 57/200 [02:53<07:22,  3.10s/it]\u001b[A\n",
            " 29% 58/200 [02:56<07:18,  3.09s/it]\u001b[A\n",
            " 30% 59/200 [02:59<07:13,  3.08s/it]\u001b[A\n",
            " 30% 60/200 [03:02<07:10,  3.07s/it]\u001b[A\n",
            " 30% 61/200 [03:05<07:08,  3.08s/it]\u001b[A\n",
            " 31% 62/200 [03:08<07:03,  3.07s/it]\u001b[A\n",
            " 32% 63/200 [03:11<07:01,  3.07s/it]\u001b[A\n",
            " 32% 64/200 [03:15<06:59,  3.09s/it]\u001b[A\n",
            " 32% 65/200 [03:18<06:55,  3.08s/it]\u001b[A\n",
            " 33% 66/200 [03:21<06:51,  3.07s/it]\u001b[A\n",
            " 34% 67/200 [03:24<06:49,  3.08s/it]\u001b[A\n",
            " 34% 68/200 [03:27<06:47,  3.08s/it]\u001b[A\n",
            " 34% 69/200 [03:30<06:43,  3.08s/it]\u001b[A\n",
            " 35% 70/200 [03:33<06:41,  3.09s/it]\u001b[A\n",
            " 36% 71/200 [03:36<06:38,  3.09s/it]\u001b[A\n",
            " 36% 72/200 [03:39<06:36,  3.10s/it]\u001b[A\n",
            " 36% 73/200 [03:42<06:33,  3.10s/it]\u001b[A\n",
            " 37% 74/200 [03:45<06:29,  3.09s/it]\u001b[A\n",
            " 38% 75/200 [03:49<06:26,  3.10s/it]\u001b[A\n",
            " 38% 76/200 [03:52<06:24,  3.10s/it]\u001b[A\n",
            " 38% 77/200 [03:55<06:20,  3.09s/it]\u001b[A\n",
            " 39% 78/200 [03:58<06:16,  3.08s/it]\u001b[A\n",
            " 40% 79/200 [04:01<06:12,  3.08s/it]\u001b[A\n",
            " 40% 80/200 [04:04<06:08,  3.07s/it]\u001b[A\n",
            " 40% 81/200 [04:07<06:05,  3.07s/it]\u001b[A\n",
            " 41% 82/200 [04:10<06:03,  3.08s/it]\u001b[A\n",
            " 42% 83/200 [04:13<06:01,  3.09s/it]\u001b[A\n",
            " 42% 84/200 [04:16<05:59,  3.10s/it]\u001b[A\n",
            " 42% 85/200 [04:19<05:56,  3.10s/it]\u001b[A\n",
            " 43% 86/200 [04:22<05:52,  3.09s/it]\u001b[A\n",
            " 44% 87/200 [04:26<05:47,  3.08s/it]\u001b[A\n",
            " 44% 88/200 [04:29<05:45,  3.08s/it]\u001b[A\n",
            " 44% 89/200 [04:32<05:41,  3.08s/it]\u001b[A\n",
            " 45% 90/200 [04:35<05:37,  3.07s/it]\u001b[A\n",
            " 46% 91/200 [04:38<05:34,  3.07s/it]\u001b[A\n",
            " 46% 92/200 [04:41<05:30,  3.06s/it]\u001b[A\n",
            " 46% 93/200 [04:44<05:28,  3.07s/it]\u001b[A\n",
            " 47% 94/200 [04:47<05:25,  3.07s/it]\u001b[A\n",
            " 48% 95/200 [04:50<05:22,  3.07s/it]\u001b[A\n",
            " 48% 96/200 [04:53<05:19,  3.07s/it]\u001b[A\n",
            " 48% 97/200 [04:56<05:17,  3.09s/it]\u001b[A\n",
            " 49% 98/200 [04:59<05:14,  3.08s/it]\u001b[A\n",
            " 50% 99/200 [05:02<05:10,  3.07s/it]\u001b[A\n",
            " 50% 100/200 [05:06<05:08,  3.08s/it]\u001b[A\n",
            " 50% 101/200 [05:09<05:05,  3.09s/it]\u001b[A\n",
            " 51% 102/200 [05:12<05:02,  3.09s/it]\u001b[A\n",
            " 52% 103/200 [05:15<04:59,  3.09s/it]\u001b[A\n",
            " 52% 104/200 [05:18<04:55,  3.08s/it]\u001b[A\n",
            " 52% 105/200 [05:21<04:52,  3.08s/it]\u001b[A\n",
            " 53% 106/200 [05:24<04:49,  3.08s/it]\u001b[A\n",
            " 54% 107/200 [05:27<04:46,  3.08s/it]\u001b[A\n",
            " 54% 108/200 [05:30<04:43,  3.08s/it]\u001b[A\n",
            " 55% 109/200 [05:33<04:41,  3.09s/it]\u001b[A\n",
            " 55% 110/200 [05:36<04:38,  3.09s/it]\u001b[A\n",
            " 56% 111/200 [05:39<04:35,  3.09s/it]\u001b[A\n",
            " 56% 112/200 [05:43<04:32,  3.09s/it]\u001b[A\n",
            " 56% 113/200 [05:46<04:28,  3.09s/it]\u001b[A\n",
            " 57% 114/200 [05:49<04:25,  3.08s/it]\u001b[A\n",
            " 57% 115/200 [05:52<04:23,  3.09s/it]\u001b[A\n",
            " 58% 116/200 [05:55<04:19,  3.09s/it]\u001b[A\n",
            " 58% 117/200 [05:58<04:16,  3.09s/it]\u001b[A\n",
            " 59% 118/200 [06:01<04:14,  3.10s/it]\u001b[A\n",
            " 60% 119/200 [06:04<04:10,  3.10s/it]\u001b[A\n",
            " 60% 120/200 [06:07<04:07,  3.09s/it]\u001b[A\n",
            " 60% 121/200 [06:10<04:03,  3.09s/it]\u001b[A\n",
            " 61% 122/200 [06:13<04:00,  3.08s/it]\u001b[A\n",
            " 62% 123/200 [06:17<03:56,  3.08s/it]\u001b[A\n",
            " 62% 124/200 [06:20<03:54,  3.08s/it]\u001b[A\n",
            " 62% 125/200 [06:23<03:51,  3.08s/it]\u001b[A\n",
            " 63% 126/200 [06:26<03:48,  3.09s/it]\u001b[A\n",
            " 64% 127/200 [06:29<03:45,  3.10s/it]\u001b[A\n",
            " 64% 128/200 [06:32<03:42,  3.09s/it]\u001b[A\n",
            " 64% 129/200 [06:35<03:39,  3.09s/it]\u001b[A\n",
            " 65% 130/200 [06:38<03:37,  3.10s/it]\u001b[A\n",
            " 66% 131/200 [06:41<03:33,  3.09s/it]\u001b[A\n",
            " 66% 132/200 [06:44<03:30,  3.09s/it]\u001b[A\n",
            " 66% 133/200 [06:48<03:28,  3.11s/it]\u001b[A\n",
            " 67% 134/200 [06:51<03:24,  3.10s/it]\u001b[A\n",
            " 68% 135/200 [06:54<03:22,  3.11s/it]\u001b[A\n",
            " 68% 136/200 [06:57<03:18,  3.11s/it]\u001b[A\n",
            " 68% 137/200 [07:00<03:15,  3.10s/it]\u001b[A\n",
            " 69% 138/200 [07:03<03:12,  3.10s/it]\u001b[A\n",
            " 70% 139/200 [07:06<03:09,  3.11s/it]\u001b[A\n",
            " 70% 140/200 [07:09<03:05,  3.09s/it]\u001b[A\n",
            " 70% 141/200 [07:12<03:02,  3.10s/it]\u001b[A\n",
            " 71% 142/200 [07:15<03:00,  3.11s/it]\u001b[A\n",
            " 72% 143/200 [07:19<02:57,  3.11s/it]\u001b[A\n",
            " 72% 144/200 [07:22<02:53,  3.11s/it]\u001b[A\n",
            " 72% 145/200 [07:25<02:50,  3.11s/it]\u001b[A\n",
            " 73% 146/200 [07:28<02:47,  3.10s/it]\u001b[A\n",
            " 74% 147/200 [07:31<02:44,  3.10s/it]\u001b[A\n",
            " 74% 148/200 [07:34<02:41,  3.10s/it]\u001b[A\n",
            " 74% 149/200 [07:37<02:37,  3.09s/it]\u001b[A\n",
            " 75% 150/200 [07:40<02:35,  3.11s/it]\u001b[A\n",
            " 76% 151/200 [07:43<02:32,  3.11s/it]\u001b[A\n",
            " 76% 152/200 [07:46<02:28,  3.09s/it]\u001b[A\n",
            " 76% 153/200 [07:50<02:25,  3.09s/it]\u001b[A\n",
            " 77% 154/200 [07:53<02:22,  3.10s/it]\u001b[A\n",
            " 78% 155/200 [07:56<02:18,  3.09s/it]\u001b[A\n",
            " 78% 156/200 [07:59<02:15,  3.08s/it]\u001b[A\n",
            " 78% 157/200 [08:02<02:12,  3.09s/it]\u001b[A\n",
            " 79% 158/200 [08:05<02:09,  3.08s/it]\u001b[A\n",
            " 80% 159/200 [08:08<02:06,  3.08s/it]\u001b[A\n",
            " 80% 160/200 [08:11<02:03,  3.09s/it]\u001b[A\n",
            " 80% 161/200 [08:14<02:00,  3.09s/it]\u001b[A\n",
            " 81% 162/200 [08:17<01:57,  3.08s/it]\u001b[A\n",
            " 82% 163/200 [08:20<01:54,  3.08s/it]\u001b[A\n",
            " 82% 164/200 [08:23<01:51,  3.08s/it]\u001b[A\n",
            " 82% 165/200 [08:27<01:48,  3.09s/it]\u001b[A\n",
            " 83% 166/200 [08:30<01:45,  3.09s/it]\u001b[A\n",
            " 84% 167/200 [08:33<01:41,  3.08s/it]\u001b[A\n",
            " 84% 168/200 [08:36<01:38,  3.09s/it]\u001b[A\n",
            " 84% 169/200 [08:39<01:36,  3.10s/it]\u001b[A\n",
            " 85% 170/200 [08:42<01:32,  3.09s/it]\u001b[A\n",
            " 86% 171/200 [08:45<01:29,  3.08s/it]\u001b[A\n",
            " 86% 172/200 [08:48<01:26,  3.09s/it]\u001b[A\n",
            " 86% 173/200 [08:51<01:23,  3.09s/it]\u001b[A\n",
            " 87% 174/200 [08:54<01:19,  3.07s/it]\u001b[A\n",
            " 88% 175/200 [08:57<01:17,  3.08s/it]\u001b[A\n",
            " 88% 176/200 [09:00<01:13,  3.07s/it]\u001b[A\n",
            " 88% 177/200 [09:04<01:10,  3.07s/it]\u001b[A\n",
            " 89% 178/200 [09:07<01:07,  3.08s/it]\u001b[A\n",
            " 90% 179/200 [09:10<01:04,  3.07s/it]\u001b[A\n",
            " 90% 180/200 [09:13<01:01,  3.07s/it]\u001b[A\n",
            " 90% 181/200 [09:16<00:58,  3.08s/it]\u001b[A\n",
            " 91% 182/200 [09:19<00:55,  3.09s/it]\u001b[A\n",
            " 92% 183/200 [09:22<00:52,  3.09s/it]\u001b[A\n",
            " 92% 184/200 [09:25<00:49,  3.09s/it]\u001b[A\n",
            " 92% 185/200 [09:28<00:46,  3.08s/it]\u001b[A\n",
            " 93% 186/200 [09:31<00:43,  3.07s/it]\u001b[A\n",
            " 94% 187/200 [09:34<00:40,  3.08s/it]\u001b[A\n",
            " 94% 188/200 [09:37<00:36,  3.08s/it]\u001b[A\n",
            " 94% 189/200 [09:40<00:33,  3.07s/it]\u001b[A\n",
            " 95% 190/200 [09:44<00:30,  3.08s/it]\u001b[A\n",
            " 96% 191/200 [09:47<00:27,  3.07s/it]\u001b[A\n",
            " 96% 192/200 [09:50<00:24,  3.07s/it]\u001b[A\n",
            " 96% 193/200 [09:53<00:21,  3.07s/it]\u001b[A\n",
            " 97% 194/200 [09:56<00:18,  3.06s/it]\u001b[A\n",
            " 98% 195/200 [09:59<00:15,  3.07s/it]\u001b[A\n",
            " 98% 196/200 [10:02<00:12,  3.08s/it]\u001b[A\n",
            " 98% 197/200 [10:05<00:09,  3.09s/it]\u001b[A\n",
            " 99% 198/200 [10:08<00:06,  3.09s/it]\u001b[A\n",
            "100% 199/200 [10:11<00:03,  3.10s/it]\u001b[A[2024-06-04 05:34:01,163] [INFO] [accelerate.accelerator.gather_for_metrics:2380] [PID:3985] The used dataset had no length, returning gathered tensors. You should drop the remainder yourself.\n",
            "\n",
            "                                   \n",
            "\u001b[A{'eval_loss': 1.3979419469833374, 'eval_runtime': 617.906, 'eval_samples_per_second': 0.324, 'eval_steps_per_second': 0.324, 'epoch': 0.78}\n",
            " 79% 42/53 [1:03:29<06:09, 33.59s/it]\n",
            "100% 200/200 [10:16<00:00,  3.09s/it]\u001b[A\n",
            "{'loss': 2.1199, 'grad_norm': 1.2421875, 'learning_rate': 2.0664665970876496e-05, 'epoch': 0.8}\n",
            "{'loss': 2.0377, 'grad_norm': 1.203125, 'learning_rate': 1.6853038769745467e-05, 'epoch': 0.82}\n",
            "{'loss': 2.1479, 'grad_norm': 1.234375, 'learning_rate': 1.339745962155613e-05, 'epoch': 0.84}\n",
            "{'loss': 2.1424, 'grad_norm': 1.0390625, 'learning_rate': 1.0312725846731175e-05, 'epoch': 0.86}\n",
            "{'loss': 2.1937, 'grad_norm': 1.1796875, 'learning_rate': 7.612046748871327e-06, 'epoch': 0.88}\n",
            "{'loss': 2.1982, 'grad_norm': 1.1015625, 'learning_rate': 5.306987050489442e-06, 'epoch': 0.89}\n",
            "{'loss': 2.1528, 'grad_norm': 1.1875, 'learning_rate': 3.40741737109318e-06, 'epoch': 0.91}\n",
            "{'loss': 2.0287, 'grad_norm': 1.125, 'learning_rate': 1.921471959676957e-06, 'epoch': 0.93}\n",
            "{'loss': 2.0022, 'grad_norm': 1.0234375, 'learning_rate': 8.555138626189618e-07, 'epoch': 0.95}\n",
            "{'loss': 2.3406, 'grad_norm': 1.1796875, 'learning_rate': 2.141076761396521e-07, 'epoch': 0.97}\n",
            "{'loss': 2.2924, 'grad_norm': 1.1640625, 'learning_rate': 0.0, 'epoch': 0.99}\n",
            "100% 53/53 [1:09:18<00:00, 36.98s/it]/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
            "  warnings.warn(\n",
            "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
            "  warnings.warn(\n",
            "{'train_runtime': 4159.9986, 'train_samples_per_second': 0.433, 'train_steps_per_second': 0.013, 'train_loss': 2.8079717429179065, 'epoch': 0.99}\n",
            "100% 53/53 [1:09:19<00:00, 78.49s/it]\n",
            "[2024-06-04 05:39:53,170] [INFO] [axolotl.train.train:173] [PID:3985] [RANK:0] Training Completed!!! Saving pre-trained model to ./outputs/out\u001b[39m\n",
            "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
            "  warnings.warn(\n",
            "(PeftModelForCausalLM(   (base_model): LoraModel(     (model): GemmaForCausalLM(       (model): GemmaModel(         (embed_tokens): Embedding(256000, 2048, padding_idx=0)         (layers): ModuleList(           (0-17): 18 x GemmaDecoderLayer(             (self_attn): GemmaSdpaAttention(               (q_proj): lora.Linear4bit(                 (base_layer): Linear4bit(in_features=2048, out_features=2048, bias=False)                 (lora_dropout): ModuleDict(                   (default): Dropout(p=0.05, inplace=False)                 )                 (lora_A): ModuleDict(                   (default): Linear(in_features=2048, out_features=4, bias=False)                 )                 (lora_B): ModuleDict(                   (default): Linear(in_features=4, out_features=2048, bias=False)                 )                 (lora_embedding_A): ParameterDict()                 (lora_embedding_B): ParameterDict()               )               (k_proj): lora.Linear4bit(                 (base_layer): Linear4bit(in_features=2048, out_features=256, bias=False)                 (lora_dropout): ModuleDict(                   (default): Dropout(p=0.05, inplace=False)                 )                 (lora_A): ModuleDict(                   (default): Linear(in_features=2048, out_features=4, bias=False)                 )                 (lora_B): ModuleDict(                   (default): Linear(in_features=4, out_features=256, bias=False)                 )                 (lora_embedding_A): ParameterDict()                 (lora_embedding_B): ParameterDict()               )               (v_proj): lora.Linear4bit(                 (base_layer): Linear4bit(in_features=2048, out_features=256, bias=False)                 (lora_dropout): ModuleDict(                   (default): Dropout(p=0.05, inplace=False)                 )                 (lora_A): ModuleDict(                   (default): Linear(in_features=2048, out_features=4, bias=False)                 )                 (lora_B): ModuleDict(                   (default): Linear(in_features=4, out_features=256, bias=False)                 )                 (lora_embedding_A): ParameterDict()                 (lora_embedding_B): ParameterDict()               )               (o_proj): lora.Linear4bit(                 (base_layer): Linear4bit(in_features=2048, out_features=2048, bias=False)                 (lora_dropout): ModuleDict(                   (default): Dropout(p=0.05, inplace=False)                 )                 (lora_A): ModuleDict(                   (default): Linear(in_features=2048, out_features=4, bias=False)                 )                 (lora_B): ModuleDict(                   (default): Linear(in_features=4, out_features=2048, bias=False)                 )                 (lora_embedding_A): ParameterDict()                 (lora_embedding_B): ParameterDict()               )               (rotary_emb): GemmaRotaryEmbedding()             )             (mlp): GemmaMLP(               (gate_proj): lora.Linear4bit(                 (base_layer): Linear4bit(in_features=2048, out_features=16384, bias=False)                 (lora_dropout): ModuleDict(                   (default): Dropout(p=0.05, inplace=False)                 )                 (lora_A): ModuleDict(                   (default): Linear(in_features=2048, out_features=4, bias=False)                 )                 (lora_B): ModuleDict(                   (default): Linear(in_features=4, out_features=16384, bias=False)                 )                 (lora_embedding_A): ParameterDict()                 (lora_embedding_B): ParameterDict()               )               (up_proj): lora.Linear4bit(                 (base_layer): Linear4bit(in_features=2048, out_features=16384, bias=False)                 (lora_dropout): ModuleDict(                   (default): Dropout(p=0.05, inplace=False)                 )                 (lora_A): ModuleDict(                   (default): Linear(in_features=2048, out_features=4, bias=False)                 )                 (lora_B): ModuleDict(                   (default): Linear(in_features=4, out_features=16384, bias=False)                 )                 (lora_embedding_A): ParameterDict()                 (lora_embedding_B): ParameterDict()               )               (down_proj): lora.Linear4bit(                 (base_layer): Linear4bit(in_features=16384, out_features=2048, bias=False)                 (lora_dropout): ModuleDict(                   (default): Dropout(p=0.05, inplace=False)                 )                 (lora_A): ModuleDict(                   (default): Linear(in_features=16384, out_features=4, bias=False)                 )                 (lora_B): ModuleDict(                   (default): Linear(in_features=4, out_features=2048, bias=False)                 )                 (lora_embedding_A): ParameterDict()                 (lora_embedding_B): ParameterDict()               )               (act_fn): PytorchGELUTanh()             )             (input_layernorm): GemmaRMSNorm()             (post_attention_layernorm): GemmaRMSNorm()           )         )         (norm): GemmaRMSNorm()       )       (lm_head): Linear(in_features=2048, out_features=256000, bias=False)     )   ) ), GemmaTokenizerFast(name_or_path='google/gemma-2b', vocab_size=256000, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<bos>', 'eos_token': '<eos>', 'unk_token': '<unk>', 'pad_token': '<pad>', 'additional_special_tokens': ['<start_of_turn>', '<end_of_turn>']}, clean_up_tokenization_spaces=False),  added_tokens_decoder={ \t0: AddedToken(\"<pad>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), \t1: AddedToken(\"<eos>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), \t2: AddedToken(\"<bos>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), \t3: AddedToken(\"<unk>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), \t4: AddedToken(\"<mask>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t5: AddedToken(\"<2mass>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t6: AddedToken(\"[@BOS@]\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t7: AddedToken(\"<unused0>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t8: AddedToken(\"<unused1>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t9: AddedToken(\"<unused2>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t10: AddedToken(\"<unused3>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t11: AddedToken(\"<unused4>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t12: AddedToken(\"<unused5>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t13: AddedToken(\"<unused6>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t14: AddedToken(\"<unused7>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t15: AddedToken(\"<unused8>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t16: AddedToken(\"<unused9>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t17: AddedToken(\"<unused10>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t18: AddedToken(\"<unused11>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t19: AddedToken(\"<unused12>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t20: AddedToken(\"<unused13>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t21: AddedToken(\"<unused14>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t22: AddedToken(\"<unused15>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t23: AddedToken(\"<unused16>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t24: AddedToken(\"<unused17>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t25: AddedToken(\"<unused18>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t26: AddedToken(\"<unused19>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t27: AddedToken(\"<unused20>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t28: AddedToken(\"<unused21>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t29: AddedToken(\"<unused22>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t30: AddedToken(\"<unused23>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t31: AddedToken(\"<unused24>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t32: AddedToken(\"<unused25>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t33: AddedToken(\"<unused26>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t34: AddedToken(\"<unused27>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t35: AddedToken(\"<unused28>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t36: AddedToken(\"<unused29>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t37: AddedToken(\"<unused30>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t38: AddedToken(\"<unused31>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t39: AddedToken(\"<unused32>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t40: AddedToken(\"<unused33>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t41: AddedToken(\"<unused34>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t42: AddedToken(\"<unused35>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t43: AddedToken(\"<unused36>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t44: AddedToken(\"<unused37>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t45: AddedToken(\"<unused38>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t46: AddedToken(\"<unused39>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t47: AddedToken(\"<unused40>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t48: AddedToken(\"<unused41>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t49: AddedToken(\"<unused42>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t50: AddedToken(\"<unused43>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t51: AddedToken(\"<unused44>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t52: AddedToken(\"<unused45>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t53: AddedToken(\"<unused46>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t54: AddedToken(\"<unused47>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t55: AddedToken(\"<unused48>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t56: AddedToken(\"<unused49>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t57: AddedToken(\"<unused50>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t58: AddedToken(\"<unused51>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t59: AddedToken(\"<unused52>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t60: AddedToken(\"<unused53>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t61: AddedToken(\"<unused54>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t62: AddedToken(\"<unused55>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t63: AddedToken(\"<unused56>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t64: AddedToken(\"<unused57>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t65: AddedToken(\"<unused58>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t66: AddedToken(\"<unused59>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t67: AddedToken(\"<unused60>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t68: AddedToken(\"<unused61>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t69: AddedToken(\"<unused62>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t70: AddedToken(\"<unused63>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t71: AddedToken(\"<unused64>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t72: AddedToken(\"<unused65>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t73: AddedToken(\"<unused66>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t74: AddedToken(\"<unused67>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t75: AddedToken(\"<unused68>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t76: AddedToken(\"<unused69>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t77: AddedToken(\"<unused70>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t78: AddedToken(\"<unused71>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t79: AddedToken(\"<unused72>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t80: AddedToken(\"<unused73>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t81: AddedToken(\"<unused74>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t82: AddedToken(\"<unused75>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t83: AddedToken(\"<unused76>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t84: AddedToken(\"<unused77>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t85: AddedToken(\"<unused78>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t86: AddedToken(\"<unused79>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t87: AddedToken(\"<unused80>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t88: AddedToken(\"<unused81>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t89: AddedToken(\"<unused82>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t90: AddedToken(\"<unused83>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t91: AddedToken(\"<unused84>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t92: AddedToken(\"<unused85>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t93: AddedToken(\"<unused86>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t94: AddedToken(\"<unused87>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t95: AddedToken(\"<unused88>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t96: AddedToken(\"<unused89>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t97: AddedToken(\"<unused90>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t98: AddedToken(\"<unused91>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t99: AddedToken(\"<unused92>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t100: AddedToken(\"<unused93>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t101: AddedToken(\"<unused94>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t102: AddedToken(\"<unused95>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t103: AddedToken(\"<unused96>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t104: AddedToken(\"<unused97>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t105: AddedToken(\"<unused98>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t106: AddedToken(\"<start_of_turn>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), \t107: AddedToken(\"<end_of_turn>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), \t108: AddedToken(\" \", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t109: AddedToken(\"  \", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t110: AddedToken(\"   \", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t111: AddedToken(\"    \", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t112: AddedToken(\"     \", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t113: AddedToken(\"      \", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t114: AddedToken(\"       \", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t115: AddedToken(\"        \", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t116: AddedToken(\"         \", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t117: AddedToken(\"          \", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t118: AddedToken(\"           \", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t119: AddedToken(\"            \", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t120: AddedToken(\"             \", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t121: AddedToken(\"              \", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t122: AddedToken(\"               \", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t123: AddedToken(\"                \", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t124: AddedToken(\"                 \", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t125: AddedToken(\"                  \", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t126: AddedToken(\"                   \", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t127: AddedToken(\"                    \", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t128: AddedToken(\"                     \", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t129: AddedToken(\"                      \", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t130: AddedToken(\"                       \", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t131: AddedToken(\"                        \", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t132: AddedToken(\"                         \", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t133: AddedToken(\"                          \", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t134: AddedToken(\"                           \", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t135: AddedToken(\"                            \", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t136: AddedToken(\"                             \", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t137: AddedToken(\"                              \", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t138: AddedToken(\"                               \", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t139: AddedToken(\"▁▁\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t140: AddedToken(\"▁▁▁\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t141: AddedToken(\"▁▁▁▁\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t142: AddedToken(\"▁▁▁▁▁\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t143: AddedToken(\"▁▁▁▁▁▁\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t144: AddedToken(\"▁▁▁▁▁▁▁\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t145: AddedToken(\"▁▁▁▁▁▁▁▁\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t146: AddedToken(\"▁▁▁▁▁▁▁▁▁\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t147: AddedToken(\"▁▁▁▁▁▁▁▁▁▁\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t148: AddedToken(\"▁▁▁▁▁▁▁▁▁▁▁\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t149: AddedToken(\"▁▁▁▁▁▁▁▁▁▁▁▁\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t150: AddedToken(\"▁▁▁▁▁▁▁▁▁▁▁▁▁\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t151: AddedToken(\"▁▁▁▁▁▁▁▁▁▁▁▁▁▁\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t152: AddedToken(\"▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t153: AddedToken(\"▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t154: AddedToken(\"▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t155: AddedToken(\"▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t156: AddedToken(\"▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t157: AddedToken(\"▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t158: AddedToken(\"▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t159: AddedToken(\"▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t160: AddedToken(\"▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t161: AddedToken(\"▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t162: AddedToken(\"▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t163: AddedToken(\"▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t164: AddedToken(\"▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t165: AddedToken(\"▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t166: AddedToken(\"▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t167: AddedToken(\"▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t168: AddedToken(\"▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t169: AddedToken(\"<table>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t170: AddedToken(\"<caption>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t171: AddedToken(\"<thead>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t172: AddedToken(\"<tbody>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t173: AddedToken(\"<tfoot>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t174: AddedToken(\"<tr>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t175: AddedToken(\"<th>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t176: AddedToken(\"<td>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t177: AddedToken(\"</table>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t178: AddedToken(\"</caption>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t179: AddedToken(\"</thead>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t180: AddedToken(\"</tbody>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t181: AddedToken(\"</tfoot>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t182: AddedToken(\"</tr>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t183: AddedToken(\"</th>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t184: AddedToken(\"</td>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t185: AddedToken(\"<h1>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t186: AddedToken(\"<h2>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t187: AddedToken(\"<h3>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t188: AddedToken(\"<h4>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t189: AddedToken(\"<h5>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t190: AddedToken(\"<h6>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t191: AddedToken(\"<blockquote>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t192: AddedToken(\"</h1>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t193: AddedToken(\"</h2>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t194: AddedToken(\"</h3>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t195: AddedToken(\"</h4>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t196: AddedToken(\"</h5>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t197: AddedToken(\"</h6>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t198: AddedToken(\"</blockquote>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t199: AddedToken(\"<strong>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t200: AddedToken(\"<em>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t201: AddedToken(\"<b>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t202: AddedToken(\"<i>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t203: AddedToken(\"<u>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t204: AddedToken(\"<s>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t205: AddedToken(\"<sub>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t206: AddedToken(\"<sup>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t207: AddedToken(\"<code>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t208: AddedToken(\"</strong>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t209: AddedToken(\"</em>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t210: AddedToken(\"</b>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t211: AddedToken(\"</i>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t212: AddedToken(\"</u>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t213: AddedToken(\"</s>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t214: AddedToken(\"</sub>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t215: AddedToken(\"</sup>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), \t216: AddedToken(\"</code>\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False), })\n",
            "\u001b[0m"
          ]
        }
      ],
      "source": [
        "!python -m axolotl.cli.train /content/gemma_axolotl.yaml"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "hHHaSsooKnZT"
      },
      "source": [
        "## Upload finetuned model to Hugging Face\n",
        "### Merge LoRA adapter\n",
        "Mering the adapter takes quite a bit memory so you may need to use the high-RAM Colab instance to avoid crashing."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {
        "id": "Yh1Gy_eMKuND"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "[2024-06-04 05:40:00,263] [INFO] [numexpr.utils._init_num_threads:161] [PID:22017] NumExpr defaulting to 8 threads.\n",
            "[2024-06-04 05:40:00,421] [INFO] [datasets.<module>:58] [PID:22017] PyTorch version 2.1.2 available.\n",
            "[2024-06-04 05:40:00,422] [INFO] [datasets.<module>:70] [PID:22017] Polars version 0.20.2 available.\n",
            "[2024-06-04 05:40:00,423] [INFO] [datasets.<module>:105] [PID:22017] TensorFlow version 2.15.0 available.\n",
            "[2024-06-04 05:40:00,424] [INFO] [datasets.<module>:118] [PID:22017] JAX version 0.4.26 available.\n",
            "2024-06-04 05:40:01.500291: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
            "2024-06-04 05:40:01.500346: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
            "2024-06-04 05:40:01.501636: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
            "2024-06-04 05:40:01.508907: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n",
            "To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n",
            "2024-06-04 05:40:02.544200: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n",
            "[2024-06-04 05:40:03,924] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n",
            "                                 dP            dP   dP \n",
            "                                 88            88   88 \n",
            "      .d8888b. dP.  .dP .d8888b. 88 .d8888b. d8888P 88 \n",
            "      88'  `88  `8bd8'  88'  `88 88 88'  `88   88   88 \n",
            "      88.  .88  .d88b.  88.  .88 88 88.  .88   88   88 \n",
            "      `88888P8 dP'  `dP `88888P' dP `88888P'   dP   dP \n",
            "                                                       \n",
            "                                                       \n",
            "\n",
            "****************************************\n",
            "**** Axolotl Dependency Versions *****\n",
            "  accelerate: 0.30.1         \n",
            "        peft: 0.11.1         \n",
            "transformers: 4.41.1         \n",
            "         trl: 0.8.6          \n",
            "       torch: 2.1.2          \n",
            "bitsandbytes: 0.43.1         \n",
            "****************************************\n",
            "\u001b[33m[2024-06-04 05:40:05,725] [WARNING] [axolotl.utils.config.models.input.check_sample_packing_wo_flash:730] [PID:22017] [RANK:0] sample_packing without flash_attention or sdp_attention does not handle cross-attention.\u001b[39m\n",
            "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
            "  warnings.warn(\n",
            "[2024-06-04 05:40:05,880] [INFO] [axolotl.normalize_config:182] [PID:22017] [RANK:0] GPU memory usage baseline: 0.000GB (+0.255GB misc)\u001b[39m\n",
            "[2024-06-04 05:40:05,880] [INFO] [axolotl.common.cli.load_model_and_tokenizer:50] [PID:22017] [RANK:0] loading tokenizer... google/gemma-2b\u001b[39m\n",
            "[2024-06-04 05:40:06,808] [DEBUG] [axolotl.load_tokenizer:280] [PID:22017] [RANK:0] EOS: 1 / <eos>\u001b[39m\n",
            "[2024-06-04 05:40:06,808] [DEBUG] [axolotl.load_tokenizer:281] [PID:22017] [RANK:0] BOS: 2 / <bos>\u001b[39m\n",
            "[2024-06-04 05:40:06,809] [DEBUG] [axolotl.load_tokenizer:282] [PID:22017] [RANK:0] PAD: 0 / <pad>\u001b[39m\n",
            "[2024-06-04 05:40:06,809] [DEBUG] [axolotl.load_tokenizer:283] [PID:22017] [RANK:0] UNK: 3 / <unk>\u001b[39m\n",
            "[2024-06-04 05:40:06,809] [INFO] [axolotl.load_tokenizer:294] [PID:22017] [RANK:0] No Chat template selected. Consider adding a chat template for easier inference.\u001b[39m\n",
            "[2024-06-04 05:40:06,809] [INFO] [axolotl.common.cli.load_model_and_tokenizer:52] [PID:22017] [RANK:0] loading model and (optionally) peft_config...\u001b[39m\n",
            "`config.hidden_act` is ignored, you should use `config.hidden_activation` instead.\n",
            "Gemma's activation function will be set to `gelu_pytorch_tanh`. Please, use\n",
            "`config.hidden_activation` if you want to override this behaviour.\n",
            "See https://github.com/huggingface/transformers/pull/29402 for more details.\n",
            "[2024-06-04 05:40:07,033] [INFO] [accelerate.utils.modeling.get_balanced_memory:989] [PID:22017] We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).\n",
            "Loading checkpoint shards: 100% 2/2 [00:05<00:00,  2.59s/it]\n",
            "[2024-06-04 05:40:12,419] [INFO] [axolotl.load_model:734] [PID:22017] [RANK:0] GPU memory usage after model load: 9.336GB (+0.002GB cache, +0.352GB misc)\u001b[39m\n",
            "[2024-06-04 05:40:12,421] [INFO] [axolotl.load_model:794] [PID:22017] [RANK:0] converting modules to torch.float32 for flash attention\u001b[39m\n",
            "[2024-06-04 05:40:12,423] [INFO] [axolotl.load_lora:951] [PID:22017] [RANK:0] found linear modules: ['v_proj', 'up_proj', 'k_proj', 'o_proj', 'gate_proj', 'q_proj', 'down_proj']\u001b[39m\n",
            "[2024-06-04 05:40:12,423] [DEBUG] [axolotl.load_lora:993] [PID:22017] [RANK:0] Loading pretrained PEFT - LoRA\u001b[39m\n",
            "trainable params: 4,902,912 || all params: 2,511,075,328 || trainable%: 0.1953\n",
            "[2024-06-04 05:40:12,642] [INFO] [axolotl.load_model:843] [PID:22017] [RANK:0] GPU memory usage after adapters: 9.354GB (+0.011GB cache, +0.368GB misc)\u001b[39m\n",
            "[2024-06-04 05:40:12,642] [INFO] [axolotl.scripts.do_merge_lora:144] [PID:22017] [RANK:0] running merge of LoRA with base model\u001b[39m\n",
            "Unloading and merging model: 100% 384/384 [00:00<00:00, 4167.04it/s]\n",
            "[2024-06-04 05:40:12,738] [INFO] [axolotl.scripts.do_merge_lora:153] [PID:22017] [RANK:0] saving merged model to: outputs/out/merged\u001b[39m\n",
            "\u001b[0m"
          ]
        }
      ],
      "source": [
        "!python -m axolotl.cli.merge_lora /content/gemma_axolotl.yaml --lora_model_dir=\"./outputs/out\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "hXdJ87eELHaP"
      },
      "source": [
        "### Push model to Hugging Face Hub"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "metadata": {
        "id": "G3Hf5uWnK_sN"
      },
      "outputs": [
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "8664ec1d4cec4d37908a78a7c9cf1fe7",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "/usr/local/lib/python3.10/dist-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()\n",
            "  return self.fget.__get__(instance, owner)()\n"
          ]
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "[2024-06-04 05:40:52,151] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n"
          ]
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "1f789709d6644e6fb71c5b1becfec2e4",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "model-00003-of-00003.safetensors:   0%|          | 0.00/134M [00:00<?, ?B/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "9ac94cea9beb42029187dcc27ebb779e",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "model-00002-of-00003.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "68ad717e82b24110a181450c8cb06b5d",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "Upload 3 LFS files:   0%|          | 0/3 [00:00<?, ?it/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "9ac94608d180470eabbe199d6dc900da",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "model-00001-of-00003.safetensors:   0%|          | 0.00/4.91G [00:00<?, ?B/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "string"
            },
            "text/plain": [
              "CommitInfo(commit_url='https://huggingface.co/windmaple/gemma-2-finetuned-model-axolotl/commit/eafbcf827d2bc9d77f177d39009e65ae10c455ff', commit_message='Upload model', commit_description='', oid='eafbcf827d2bc9d77f177d39009e65ae10c455ff', pr_url=None, pr_revision=None, pr_num=None)"
            ]
          },
          "execution_count": 8,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "from transformers import AutoModel\n",
        "\n",
        "model = AutoModel.from_pretrained(\"./outputs/out/merged\", local_files_only=True)\n",
        "model.push_to_hub(\"gemma-2-finetuned-model-axolotl\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "LyvlH1GMJyjI"
      },
      "source": [
        "## Conclusion\n",
        "\n",
        "This notebook demonstrates how to use Axolotl to do instruction tuning for the Gemma 2B model. If you want to finetune with another dataset, please check out the Axolotl documentation on [Dataset Formats](https://openaccess-ai-collective.github.io/axolotl/docs/dataset-formats/)."
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "name": "[Gemma_2]Finetune_with_Axolotl.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
